Supervised vs unsupervised learning

The single question that determines which class of algorithm to use — does your data have labels?

Every machine learning problem starts with the same diagnostic question: does your training data include the answers? That single question separates two fundamentally different algorithmic families, each with its own assumptions, methods, and failure modes.

Supervised learning

In supervised learning, each training example comes with a label — the correct answer the model is trying to learn to produce. The algorithm learns a mapping from inputs to outputs by comparing its predictions against those labels and adjusting until the gap is small.

The two main forms are:

Regression — the label is a continuous number. Predicting tomorrow's temperature, estimating a house price, forecasting next quarter's revenue. The model outputs a real-valued number, and the loss function measures how far off it was.
Classification — the label is a category. Spam or not-spam, digit 0–9, tumour benign or malignant. The model outputs a class (or a probability distribution over classes).

Three concrete examples with the "what's the target?" test:

Problem	Input features	Target	Type
Email spam detection	Word frequencies, sender	Spam / not-spam	Classification
Predicting loan default	Income, credit history	Default / no default	Classification
Estimating shipping time	Distance, weight, carrier	Days to deliver	Regression

The discriminator is simple: if someone could in principle label each row by hand — even if it would be expensive — you have a supervised problem.

Unsupervised learning

Unsupervised learning removes the labels entirely. The algorithm receives only inputs and must find structure — patterns, groupings, or compressed representations — on its own.

Two main forms:

Clustering — partition examples into groups where members are more similar to each other than to members of other groups. k-means and DBSCAN are canonical examples. Nothing tells the algorithm how many clusters exist or what they mean; you interpret them after the fact.
Dimensionality reduction — compress high-dimensional data into fewer dimensions while preserving as much structure as possible. PCA (principal component analysis) and t-SNE are common. Useful for visualisation and as a preprocessing step before supervised learning.

Three examples:

Problem	Input	No target because…
Customer segmentation	Purchase history	No pre-defined groups exist
Anomaly detection in server logs	Log feature vectors	"Normal" is not labelled
Document topic modelling	Word counts	Topics are latent, not labelled

Semi-supervised learning

In practice, labels are expensive. A medical dataset might have a million scans but only ten thousand reviewed by a radiologist. Semi-supervised learning uses a small labelled set combined with a large unlabelled set — the unlabelled data still carries information about the input distribution even without labels. Self-training (repeatedly labelling confident predictions and retraining) is the simplest approach.

The "what's the target?" test is a reliable heuristic, but it has an edge case: reinforcement learning, where the "label" is a delayed reward signal, not a pre-labelled example. That's a third paradigm. At this level, supervised and unsupervised cover the vast majority of data science problems.

Where to go next

Now that you can classify a problem by paradigm, the next lesson examines a universal challenge in supervised learning: the bias-variance tradeoff — the tension between models that are too simple and models that are too complex.

Finished reading? Mark it complete to track your progress.

Supervised vs unsupervised learning

Supervised learning

Unsupervised learning

Semi-supervised learning

Where to go next

On this page