Code of the Day
IntermediateReshaping and Merging

Window operations

Run df.melt(), df.pivot_table(), df.rolling(), and df.shift() in code to reshape data and build lag features.

Data ScienceIntermediate10 min read
Recommended first
By the end of this lesson you will be able to:
  • Use df.melt() to convert a wide DataFrame to long form
  • Use df.pivot_table() to aggregate and convert long data to wide form
  • Apply df.rolling(n).mean() to compute a moving average
  • Use df.shift(n) to create a lag feature

The previous lesson explained why you choose long or wide form and what melt and pivot do conceptually. This lesson puts that into practice, and then extends into two more reshape-adjacent operations: rolling windows and shifting, which are essential any time you work with ordered sequences.

Melt and pivot_table in code

The temperature dataset from the concept lesson is a perfect starting point. Wide form arrives; long form is needed for plotting and groupby.

Python — editable, runs in your browser

id_vars lists the columns that should stay as identifiers — every other column is melted. var_name and value_name give the two new columns their names. pivot_table is the safer complement to pivot: if there are duplicate (index, column) pairs it applies aggfunc rather than raising an error.

Rolling windows

A rolling window computes a statistic over a sliding slice of rows. The most common use is a moving average, which smooths out noise in a time series.

Python — editable, runs in your browser

The first two rows of rolling_3 are NaN because there are not yet three observations to average. This is normal — decide whether to drop those rows or fill them depending on your analysis.

Shift and lag features

shift(1) moves every value down by one row, aligning each row with the previous period's value. This creates a lag feature: a column that tells a model what the value was one step earlier. A negative shift (shift(-1)) creates a lead feature — the next period's value.

Rolling windows and shifts both require ordered data. If your DataFrame is not already sorted by the time or sequence column, sort it first with df.sort_values("date") before applying .rolling() or .shift().

Where to go next

Next: apply and transform — a conceptual guide to pandas' four row/column/element operations and when each is the right tool.

Finished reading? Mark it complete to track your progress.

On this page