Part One
The Library That Changed Data Analysis
Every time a data journalist compares prices across supermarkets, or a researcher tracks food inflation over time, or a public health official monitors which regions have the highest cost of living — they are likely working with pandas. It is the standard Python library for loading, cleaning, transforming, and analysing tabular data: data arranged in rows and columns, like a spreadsheet.
The name comes from panel data, an econometrics term for datasets that track multiple subjects over multiple time periods. But you do not need to know econometrics. You just need to know that pandas gives you two powerful tools: the Series (a single column of data) and the DataFrame (a full table with rows and columns).
Before you can use pandas, you import it. By convention, every data analyst in the world uses the same abbreviation:
The phrase import pandas as pd loads the library and gives it the short name pd. From this point on, every pandas tool is accessed by typing pd. followed by its name. This is the very first line of virtually every data analysis script ever written in Python.
Part Two
Series — A Single Column of Data
The simplest pandas object is a Series: an ordered list of values, each with a label called an index. Think of it as a single column from a spreadsheet — for example, a list of food prices.
The numbers on the left (0, 1, 2 …) are the default index — automatic labels that pandas assigns when you do not specify your own. The numbers on the right are the values. At the bottom, pandas tells you the data type: float64 means decimal numbers stored with 64-bit precision.
A more useful Series labels each value explicitly. Here, the index is the name of the food item:
You access a single value by putting its label inside square brackets. This is just like looking up a word in a dictionary — the label is the key, the price is the value.
print(prices * 1.1) to the cell above and running it again. pandas applies the operation to every item automatically — no loop needed.
Part Three
DataFrame — A Full Table of Data
A DataFrame is a table: rows and columns, like a spreadsheet. Each column is a Series. The columns can hold different types of data — names, prices, categories, booleans — as long as each column is consistent within itself.
The most common way to create a DataFrame from scratch is to pass a Python dictionary to pd.DataFrame(). Each key becomes a column name; each value is a list that fills that column.
Look at what pandas prints. Each row has an index number (0 to 4). Each column has a name taken from the dictionary keys. The data types are inferred automatically: strings for text, floats for decimals. This is already more structured than a plain Python list.
The variable name df is a universal convention for DataFrames, just like pd is for the library itself. You will see it in every pandas tutorial on the planet.
ValueError. Try it: remove one item from "price" and run the cell again to see the error.
Part Four
Accessing Columns
Once you have a DataFrame, the most common operation is to look at one column at a time. You access a column by writing the DataFrame name, then the column name in square brackets and quotation marks:
Notice that df["price"] returns a Series — a single column with the DataFrame's row index on the left and the price values on the right. Every column in a DataFrame is a Series.
Because a column is a Series, you can immediately call mathematical methods on it:
df.price is the same as df["price"]. However, the bracket notation always works and is preferred when column names contain spaces or match a DataFrame method name.
Part Five
Your Turn — Build a Price List
Create a DataFrame that tracks prices in a local market. Include at least 6 items across at least 2 categories. Then print the entire table and answer these two questions using pandas methods:
- What is the total cost if you buy one of everything?
- What is the average price per item?
Chapter Navigation
Move between chapters.