Code of the Day
BeginnerData Fundamentals

Cleaning data

Drop missing values, remove duplicates, and fix column types in pandas — the three moves that fix most data quality problems.

Data ScienceBeginner10 min read
By the end of this lesson you will be able to:
  • Drop rows with missing values using .dropna()
  • Remove duplicate rows using .drop_duplicates()
  • Convert a column to the correct type using .astype()

Knowing the four data quality problems is not enough — you need to fix them. This lesson covers the three methods that handle the most common cases: .dropna(), .drop_duplicates(), and .astype(). The code below creates a deliberately messy and cleans it step by step.

Python — editable, runs in your browser

What each step does

.dropna() removes rows where any value is NaN or None. By default it checks all columns — pass subset=["column_name"] to only check specific ones. The result here drops order 3's duplicate and the row with no customer name.

Wait — it also drops the row where amount is None. After .dropna() you are left with 4 rows. Then .drop_duplicates() removes the second occurrence of order 3 (both rows were identical), leaving 3 rows.

.drop_duplicates() compares entire rows by default. Pass subset=["order_id"] if you want to deduplicate by a specific key column (keeping the first occurrence).

.astype(float) converts the column in-place (on a copy). If any value cannot be converted — say the string "N/A" — pandas will raise a ValueError. That is usually what you want: the error tells you there is another problem to fix, rather than silently producing NaN.

pandas methods like .dropna() and .drop_duplicates() return a new DataFrame by default — they do not modify the original. Assign the result to a new name (as done above) or use inplace=True. Keeping the original around while you experiment is a good habit.

Where to go next

Now you can inspect and clean a dataset. The lab is next: an end-to-end practice session where you apply all of these skills to a new dataset without step-by-step instructions.

Finished reading? Mark it complete to track your progress.

On this page