Part One
Detecting Missing Values
In the real world, datasets are never perfect. A price might be missing because a product was out of stock. A category might be blank because it was not recorded. pandas represents missing values as NaN (Not a Number), and it provides methods to find and handle them.
.isnull() returns a boolean DataFrame where True marks every missing value. Calling .sum() on it counts the True values per column — giving you a quick inventory of how much data is missing. This is always one of the first checks in any real-world analysis.
Part Two
.dropna() and .fillna() — Handling Missing Data
Once you know where the missing values are, you have two main options: drop the affected rows, or fill in the missing values with a substitute.
.dropna() — remove rows with missing values
.fillna() — replace missing values
Sometimes dropping rows loses too much data. .fillna() replaces NaN with a value you specify — the mean, a fixed default, or a placeholder string.
Part Three
Renaming Columns and Removing Duplicates
.rename() — fix column names
.drop_duplicates() — remove repeated rows
Part Four
Putting It All Together — A Complete Food Price Analysis
You now have all the tools needed to conduct a full data analysis. Below is a complete, commented script that takes a raw (and slightly messy) price survey dataset, cleans it, and produces a set of findings you could publish.
Read through the script section by section. Notice how each step builds on the previous one: load → clean → enrich → analyse. This is the standard structure of any data analysis pipeline, whether you are working with 15 rows or 15 million.
Part Five
Final Project — Your Own Price Survey
Build a complete analysis of your own food price dataset. Walk to a local supermarket or browse an online grocery store and record the prices of 15–20 items. Then, using the full toolkit from this book, answer:
- What are the three most expensive categories?
- What is the organic premium in your market?
- Which items represent the best value (lowest price-to-median ratio)?
- Are there any suspicious outliers in your data?
Chapter Navigation
Move between chapters.