Scaling in practice

Apply MinMaxScaler and StandardScaler from sklearn — fit on training data, transform both splits, and verify the before/after statistics.

Scikit-learn scalers follow a consistent interface: .fit() computes the statistics from training data, and .transform() applies the scaling. You can chain them with .fit_transform() on the training set, but never on the test set — fit on train, then transform only on test.

MinMaxScaler

Python — editable, runs in your browser

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

df = pd.DataFrame({
  "age":    [22, 35, 28, 45, 31, 52, 25, 38, 41, 29,
             33, 47, 26, 55, 30, 44, 36, 50, 27, 42],
  "income": [28000, 52000, 38000, 75000, 45000, 92000, 32000,
             61000, 68000, 40000, 49000, 80000, 35000, 105000,
             43000, 73000, 57000, 88000, 37000, 70000],
  "score":  [72, 85, 68, 91, 78, 88, 74, 82, 79, 75,
             80, 90, 71, 93, 77, 87, 83, 89, 73, 86],
})

X = df[["age", "income"]]
y = df["score"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)   # fit + transform on train
X_test_scaled  = scaler.transform(X_test)         # transform only on test

print("Before scaling — train stats:")
print(X_train.describe().round(1))

import numpy as np
train_df = pd.DataFrame(X_train_scaled, columns=["age","income"])
print("\nAfter MinMaxScaler — train stats:")
print(train_df.describe().round(3))
print("\nTest set after transform (first row):", X_test_scaled[0].round(4))

After scaling, both age and income have minimum 0 and maximum 1 on the training set. The test set is transformed using the same min/max values learned from training — the test set min and max may not reach exactly 0 or 1.

StandardScaler

Python — editable, runs in your browser

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X = df[["age", "income"]]
y = df["score"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_std = scaler.fit_transform(X_train)
X_test_std  = scaler.transform(X_test)

train_df = pd.DataFrame(X_train_std, columns=["age","income"])
print("After StandardScaler — train stats:")
print(train_df.describe().round(3))

print("\nMean (should be ~0):", train_df.mean().round(4).to_dict())
print("Std  (should be ~1):", train_df.std().round(4).to_dict())

Scikit-learn scalers work on NumPy arrays. fit_transform() returns an array, not a DataFrame. If you need column names for downstream steps, wrap the result: pd.DataFrame(X_train_std, columns=X_train.columns).

Comparing the two

After fitting on the same training data:

Statistic	MinMaxScaler	StandardScaler
Range	[0, 1] on train	Unbounded
Mean	Not necessarily 0	~0
Std	Not necessarily 1	~1
Sensitive to outliers	Yes	Less so

Choose MinMaxScaler when you know the feature has a bounded range and no extreme outliers. Choose StandardScaler when the distribution is roughly Gaussian or when outliers are present.

Where to go next

Next: lab — prepare a dataset — an end-to-end pipeline taking raw mixed-type data through cleaning, encoding, splitting, and scaling to produce a train/test pair ready for a model.

Finished reading? Mark it complete to track your progress.

MinMaxScaler

StandardScaler

Comparing the two

Where to go next

On this page