Part One
Split–Apply–Combine
In Chapter 5 you computed average prices by filtering for each category individually — one filter call per category. That works for five categories. It does not scale to fifty, or five hundred. .groupby() solves this elegantly.
The idea behind groupby is called split–apply–combine:
- Split — divide the DataFrame into groups based on the values of one column.
- Apply — compute a statistic (mean, sum, count…) for each group independently.
- Combine — gather the results into a new table, one row per group.
In one line you get the mean price for every category at once. The result is a Series with the category names as the index. From this alone you can already report: meat and fish are by far the most expensive categories; grains and vegetables are the most affordable.
Part Two
Different Aggregations
You can call any of the statistical methods from Chapter 5 on a grouped Series: .mean(), .median(), .sum(), .min(), .max(), .count(), .std().
Part Three
.agg() — Multiple Statistics at Once
When you want several statistics per group simultaneously, .agg() (short for aggregate) lets you pass a list of function names. The result is a clean table — perfect for a published data summary.
This is publication-ready output. You can see at a glance that the Fish category has the widest range (€1.59 to €8.99) and the Meat category has the highest mean price per item. This type of table is the backbone of most data-driven food price stories.
.agg():df.groupby("category")["price"].agg(items="count", avg="mean", max_price="max")
Part Four
Groupby Multiple Columns
You can group by more than one column at a time by passing a list to .groupby(). The result has a multi-level index — one level per grouping column. This lets you cross-tabulate, for example: average price by category and organic status.
The output shows a MultiIndex — categories on the outer level, True/False on the inner level. For categories where only organic or only non-organic items exist (like Bakery), only one row appears. This kind of cross-tabulation is extremely useful for investigating whether the organic premium varies by food category.
Part Five
Your Turn — Category Price Report
Use .groupby() and .agg() to build a summary table that a journalist could use in a food price article. Include: number of items per category, average price, and the range (max − min) per category.
.groupby() with a single column and with multiple columns; computing statistics per group; using .agg() to get multiple statistics in one table; and naming aggregated columns. Next chapter: sorting results and adding computed columns to your DataFrame.
Chapter Navigation
Move between chapters.