Part One
The Core Statistical Methods
Every numeric Series in pandas has a full set of statistical methods built in. You call them directly on a column. Here are the ones you will reach for most often.
Notice that the mean (€2.94) is noticeably higher than the median (€1.89). This happens because a few very expensive items — Salmon at €8.99, Beef at €7.49 — pull the average up, while the majority of items are inexpensive staples. The median, being the middle value, is more robust to these extremes.
Part Two
Percentiles and the Spread of Prices
Percentiles divide your data into hundredths. The 25th percentile (Q1) is the value below which 25% of the data falls. The 50th percentile is the median. The 75th percentile (Q3) is the value below which 75% of the data falls.
The gap between Q1 and Q3 is called the interquartile range (IQR) — a robust measure of spread that is not influenced by extreme values.
Items above the 75th percentile cost more than €4.14. That threshold tells you which items are in the top quarter of prices for this market — a useful data point for a story about food affordability.
Part Three
Statistics on Filtered Data
You can chain filtering and statistics together. This is where the power of pandas really begins to show. A question like "what is the average price of organic items?" becomes a single readable line.
You can see that organic items are on average more expensive — but that the gap is partly explained by which categories are organic. A rigorous comparison would control for category (see Chapter 6).
Part Four
Finding the Extreme Items — .idxmin() and .idxmax()
Knowing the minimum or maximum value is useful. Knowing which row has the minimum or maximum is even more useful. .idxmin() and .idxmax() return the index label of the extreme value, which you can then use to look up the full row.
Part Five
Your Turn — A Cost-of-Living Report
Imagine you are writing a short data-driven report on food affordability. Use the statistics tools to answer these questions:
- What percentage of items cost more than the mean price? (Hint: filter, count, divide by total rows.)
- What is the price range (max − min) for vegetables?
- Is the median price of dairy higher or lower than the overall median?
.count(), .sum(), .mean(), .median(), .std(), .min(), .max(); computing percentiles with .quantile(); combining filtering and statistics to compare subgroups; and finding extreme rows with .idxmin() and .idxmax(). Next chapter: groupby — computing statistics for every category at once.
Chapter Navigation
Move between chapters.