Code of the Day
IntermediateFeature Engineering

Scaling in practice

Apply MinMaxScaler and StandardScaler from sklearn — fit on training data, transform both splits, and verify the before/after statistics.

Data ScienceIntermediate10 min read
By the end of this lesson you will be able to:
  • Apply MinMaxScaler to produce [0, 1]-scaled features
  • Apply StandardScaler to produce mean-0, std-1 features
  • Fit a scaler on the training set and transform both train and test correctly

Scikit-learn scalers follow a consistent interface: .fit() computes the statistics from training data, and .transform() applies the scaling. You can chain them with .fit_transform() on the training set, but never on the test set — fit on train, then transform only on test.

MinMaxScaler

Python — editable, runs in your browser

After scaling, both age and income have minimum 0 and maximum 1 on the training set. The test set is transformed using the same min/max values learned from training — the test set min and max may not reach exactly 0 or 1.

StandardScaler

Python — editable, runs in your browser

Scikit-learn scalers work on NumPy arrays. fit_transform() returns an array, not a DataFrame. If you need column names for downstream steps, wrap the result: pd.DataFrame(X_train_std, columns=X_train.columns).

Comparing the two

After fitting on the same training data:

StatisticMinMaxScalerStandardScaler
Range[0, 1] on trainUnbounded
MeanNot necessarily 0~0
StdNot necessarily 1~1
Sensitive to outliersYesLess so

Choose MinMaxScaler when you know the feature has a bounded range and no extreme outliers. Choose StandardScaler when the distribution is roughly Gaussian or when outliers are present.

Where to go next

Next: lab — prepare a dataset — an end-to-end pipeline taking raw mixed-type data through cleaning, encoding, splitting, and scaling to produce a train/test pair ready for a model.

Finished reading? Mark it complete to track your progress.

On this page