October 31, 2023
Classifying incoming emails into “spam” and “not spam”.
Besides the major distinction between using labeled or unlabeled data, the two approaches have other significant differences, as pointed out by Martin Keen, a Master Inventor at IBM.
The algorithm is trained with labeled data sets
The algorithm is trained with unlabeled data sets
Easy to measure the system’s quality during the model training due to reference data availability
In most cases, you get user feedback only after the system is implemented
It requires direct intervention to label data
Doesn’t require manual data labeling, but model training still involves human supervision
Random forests, support vector machines, linear regression, NN, etc.
K-Means clustering, PCA, autoencoders, Apriori, NN, etc.
It’s less computationally complex
It has higher compute requirements
Supervised learning models are generally more accurate
Unsupervised learning models can be less accurate
You know both the input and the corresponding output
You work with unclassified data and the output is unknown
The peculiarities of supervised and unsupervised learning make them ideal for different applications and business scenarios. Here are some examples.
Analyzing user interactions on social media and online platforms to assess their attitude towards topics, products, or brands and refine marketing campaigns.
Processing satellite imagery and radar measurements to identify weather patterns and generate precipitation maps more accurately than via statistical models.
Forecasting stock price fluctuations and market volatility based on financial trends and corporate earnings to build more balanced portfolios while minimizing risk.
Calculating the potential value of a real estate property based on its features and location to ensure more profitable investments.
Monitoring economic conditions, seasonality-related purchase patterns, and other factors to predict upcoming sales trends and optimize restocking operations.
Detecting and isolating persons in pictures and videos based on their biometric data to classify multimedia content and automate tagging.
Processing audio inputs and interpreting natural language to power chatbots, moderate online content, and enable real-time transcriptions or translations.
Probing radiological images and other sources to identify tumors, traumas, or other conditions and enable accurate diagnoses.
K-Means is a clustering algorithm that assigns data points to 'K groups'. The K value is the volume of identifiable clusters in a dataset based on their similarity. A higher K value means that more groups are identified, leading to more diverse outcomes and inferred relationships between the data points.
Scheme title: K-Means clustering
Data source: realpython.com — K-Means Clustering in Python: A Practical Guide