Code of the Day
IntermediateReshaping and Merging

Apply and transform

Distinguish apply, transform, agg, and map — four pandas operations that look similar but serve different purposes.

Data ScienceIntermediate6 min read
By the end of this lesson you will be able to:
  • Explain what each of apply, transform, agg, and map does to a Series or DataFrame
  • Choose the right operation given a desired input and output shape
  • Recognise when apply is the appropriate escape hatch for custom logic

Pandas gives you four ways to apply a function to your data: apply, transform, agg, and map. They look interchangeable at first glance — each takes a function and returns something. The difference is in what shape comes back, and that shape determines which operation you need.

The four operations

map — element-wise on a Series

Series.map(fn) applies fn to each individual value and returns a Series of the same length. It is for replacing or converting individual values: turning a string column into a numeric code, or looking up a value in a dictionary.

df["size_code"] = df["size"].map({"small": 1, "medium": 2, "large": 3})

The output has the same shape as the input. map on a DataFrame column never collapses rows or changes length.

agg — reduces a group to a summary

GroupBy.agg(fn) (or Series.agg) reduces many values to fewer. Calling .groupby("category").agg("mean") turns every group into a single row. The result is shorter than the input. Use agg when you want summary statistics per group.

transform — group-aware, same shape back

GroupBy.transform(fn) applies a function per group but returns a result that is aligned back to the original index. The output has the same length as the original DataFrame. This is the operation for adding a derived column that needs group context — for example, the group mean or the within-group rank — while keeping every original row.

df["group_mean"] = df.groupby("category")["value"].transform("mean")

Each row gets the mean of its own group, not the overall mean.

apply — the flexible escape hatch

DataFrame.apply(fn, axis=0|1) applies fn to each column (axis=0) or each row (axis=1). It can return a scalar (reducing), a Series of the same length (mapping), or even a new DataFrame. Because it is so flexible, it is also the slowest. Use apply when none of the more specific operations can express what you need — complex multi-column logic, for example.

A decision guide

QuestionOperation
Replace each value individually?map
Summarise each group to one number?agg
Add a group-context column, keep all rows?transform
Complex row- or column-wise custom logic?apply

When you find yourself writing apply frequently, it is worth pausing to check whether a vectorised pandas operation exists. Vectorised operations (arithmetic, str methods, clip, where) are often 10–100x faster than apply because they skip Python's per-row function-call overhead.

The shape rule

The cleanest way to remember the difference: think about the shape of what comes back relative to what went in.

  • map → same length, element-by-element
  • transform → same length, group-aware
  • agg → shorter, one row per group
  • apply → any shape, depends on what your function returns

If you need the original rows to be intact with a new column attached, transform is almost always the right choice. If you need a summary table, agg is. If you need to convert individual values, map is. Everything else is apply.

Where to go next

Next: custom aggregations — seeing all four operations in runnable code, including transform for group-normalisation and multi-function agg.

Finished reading? Mark it complete to track your progress.

On this page