Book 4 — Data Analysis with Python

Python for All

Chapter One — What is pandas?

Thanasis Troboukis  ·  All Books

Book Four · Chapter One

What is pandas?

pandas is the most widely used Python library for working with data. In this chapter you create your first tables of data and learn to read them, column by column and row by row.

The Library That Changed Data Analysis

Every time a data journalist compares prices across supermarkets, or a researcher tracks food inflation over time, or a public health official monitors which regions have the highest cost of living — they are likely working with pandas. It is the standard Python library for loading, cleaning, transforming, and analysing tabular data: data arranged in rows and columns, like a spreadsheet.

The name comes from panel data, an econometrics term for datasets that track multiple subjects over multiple time periods. But you do not need to know econometrics. You just need to know that pandas gives you two powerful tools: the Series (a single column of data) and the DataFrame (a full table with rows and columns).

Before you can use pandas, you import it. By convention, every data analyst in the world uses the same abbreviation:

Python · Try it

      

The phrase import pandas as pd loads the library and gives it the short name pd. From this point on, every pandas tool is accessed by typing pd. followed by its name. This is the very first line of virtually every data analysis script ever written in Python.

Note on load time: The first time you press Run in a book4 chapter, the page loads both Python and the pandas library. This can take 5–15 seconds. After that, all cells run instantly.

Series — A Single Column of Data

The simplest pandas object is a Series: an ordered list of values, each with a label called an index. Think of it as a single column from a spreadsheet — for example, a list of food prices.

Python · Try it

      

The numbers on the left (0, 1, 2 …) are the default index — automatic labels that pandas assigns when you do not specify your own. The numbers on the right are the values. At the bottom, pandas tells you the data type: float64 means decimal numbers stored with 64-bit precision.

A more useful Series labels each value explicitly. Here, the index is the name of the food item:

Python · Try it

      

You access a single value by putting its label inside square brackets. This is just like looking up a word in a dictionary — the label is the key, the price is the value.

Series arithmetic: You can do maths on an entire Series at once. Try adding print(prices * 1.1) to the cell above and running it again. pandas applies the operation to every item automatically — no loop needed.

DataFrame — A Full Table of Data

A DataFrame is a table: rows and columns, like a spreadsheet. Each column is a Series. The columns can hold different types of data — names, prices, categories, booleans — as long as each column is consistent within itself.

The most common way to create a DataFrame from scratch is to pass a Python dictionary to pd.DataFrame(). Each key becomes a column name; each value is a list that fills that column.

Python · Try it

      

Look at what pandas prints. Each row has an index number (0 to 4). Each column has a name taken from the dictionary keys. The data types are inferred automatically: strings for text, floats for decimals. This is already more structured than a plain Python list.

The variable name df is a universal convention for DataFrames, just like pd is for the library itself. You will see it in every pandas tutorial on the planet.

Key rule: Every list in the dictionary must have the same length. If one column has 5 values and another has 4, pandas will raise a ValueError. Try it: remove one item from "price" and run the cell again to see the error.

Accessing Columns

Once you have a DataFrame, the most common operation is to look at one column at a time. You access a column by writing the DataFrame name, then the column name in square brackets and quotation marks:

Python · Try it

      

Notice that df["price"] returns a Series — a single column with the DataFrame's row index on the left and the price values on the right. Every column in a DataFrame is a Series.

Because a column is a Series, you can immediately call mathematical methods on it:

Python · Try it

      
Dot notation: You can also access a column as an attribute if the column name has no spaces: df.price is the same as df["price"]. However, the bracket notation always works and is preferred when column names contain spaces or match a DataFrame method name.

Your Turn — Build a Price List

Create a DataFrame that tracks prices in a local market. Include at least 6 items across at least 2 categories. Then print the entire table and answer these two questions using pandas methods:

Python · Your turn

      
What you learned in this chapter: what pandas is and why it matters; the difference between a Series (one column) and a DataFrame (a full table); how to create both from Python data; and how to access a column and call basic methods on it. In the next chapter you will load a larger dataset and use pandas tools to explore its shape, types, and summary statistics.

Chapter Navigation

Move between chapters.

Loading Python environment — this may take a moment…