Code of the Day
IntermediateReshaping and Merging

Lab: merge and reshape

Guided end-to-end exercise — merge two datasets, reshape to long form, compute a rolling average, and flag above-average periods.

Lab · optionalData ScienceIntermediate30 min
Recommended first
By the end of this lesson you will be able to:
  • Merge two DataFrames on a shared key and verify the result shape
  • Melt the merged table to long form
  • Compute a 3-period rolling average per product
  • Use apply to add a boolean flag column based on a computed threshold

This is an optional lab. No new concepts — just practice applying everything from the Reshaping and Merging module to a dataset you have not seen before. Work through each step, run the code, and check that the output shapes match your expectations before moving on.

You have two tables: a products catalogue and a sales log. Your goal is to merge them, reshape the result for analysis, compute a rolling trend, and annotate each row with whether it was an above-average period.

Step 1 — inspect the data

Run this block first to see what you are working with.

Python — editable, runs in your browser

Before merging, think: how many rows do you expect in the merged table? The sales table has 12 rows and every product_id in sales exists in products, so a merge on product_id should produce exactly 12 rows.

Step 2 — merge on product_id

Use pd.merge() with how="inner" and on="product_id". Verify the shape.

Python — editable, runs in your browser

You now have product name and category alongside each sales figure. If your shape is larger than (12, 5), check for duplicate product_id values in the products table.

Step 3 — melt to long form for grouped analysis

The merged table is already reasonably long, but imagine you wanted to compare units_sold across different metrics side by side. Practice the melt pattern by treating period and product_name as identifiers and confirming the structure:

Python — editable, runs in your browser

Step 4 — rolling 3-period average per product

Sort by product and period, then compute a rolling mean within each product group. Because the data has only three periods, min_periods=1 lets the window start as soon as there is one value:

Python — editable, runs in your browser

The rolling average smooths out period-to-period noise. For products with only three data points the final period's rolling average equals the simple mean of all three periods — which is expected.

Step 5 — flag above-average periods

Use apply to add a boolean column. A period is "above average" if its units_sold exceeds that product's mean across all periods:

Python — editable, runs in your browser

The above_avg flag could also be written as a vectorised comparison: merged["above_avg"] = merged["units_sold"] > merged["product_mean"]. That is faster and more idiomatic. The apply version here is intentional practice — in real work, prefer the vectorised form when you can express it.

Done?

You have merged two datasets, reshaped between wide and long form, computed a group-aware rolling average using transform, and added a derived column with apply. These four operations together cover the core of the Reshaping and Merging module. The next module — Visualisation — picks up here: with data shaped correctly, you are ready to plot it.

Finished reading? Mark it complete to track your progress.

On this page