Book 4 · Chapter Three — Selecting Data

Part One

Selecting One or More Columns

A real-world grocery dataset might have 30 columns — store name, barcode, product description, brand, weight, price, discount, tax code, and more. You rarely want all of them at once. pandas lets you select just the columns you care about.

Single column — returns a Series

Python · Try it

Multiple columns — returns a DataFrame

To select multiple columns, pass a list of column names inside the square brackets — this means two sets of brackets: the outer ones for the selection, and the inner ones for the Python list.

Python · Try it

Single vs double brackets: df["price"] returns a Series (one column). df[["price"]] returns a DataFrame with one column. The distinction matters when you pass the result to other functions that expect a specific type.

Part Two

.iloc[] — Select by Position

.iloc[] (short for integer location) lets you select rows and columns by their numeric position, like indexing a list. The syntax is df.iloc[row, column]. Positions start at 0.

Python · Try it

.iloc[] uses Python slice notation: 0:3 means rows 0, 1, and 2 (not 3 — the end of a slice is exclusive). A negative index like -1 counts from the end, so df.iloc[-1] is always the last row regardless of how many rows the dataset has.

Python · Try it — every other row

Part Three

.loc[] — Select by Label

.loc[] (short for label location) selects by the index label and column name. For our grocery dataset, the index is the default numeric one (0, 1, 2 …), so .loc[0] gives you row 0. But .loc[] really shines when the index is meaningful — for example, when each row is labelled with a date or a product code.

Python · Try it

Key difference — loc vs iloc:
df.iloc[0:3] → rows at positions 0, 1, 2 (exclusive end)
df.loc[0:3] → rows with labels 0, 1, 2, 3 (inclusive end)
With a numeric default index they look similar, but on a date-indexed dataset the difference is critical.

Using item names as the index

A more natural way to use .loc[] is to set a meaningful column as the index using .set_index():

Python · Try it

Part Four

.unique() and .value_counts()

Two essential methods for understanding a text column: how many distinct values are there, and how often does each appear?

Python · Try it

.value_counts() returns the counts sorted from most to least frequent. It is one of the most useful tools in exploratory analysis — the equivalent of a quick frequency table. On a real supermarket dataset with thousands of products, it tells you at a glance which categories dominate the catalogue.

Part Five

Your Turn — Slice and Inspect

Work with the grocery DataFrame and answer the following using pandas selection tools:

Select only the item, price, and organic columns.
Use .iloc[] to show the first 5 rows of those columns.
How many unique units of measurement are there in the dataset?

Python · Your turn

What you learned in this chapter: how to select one or more columns with [] and [[]]; how to select rows by position with .iloc[] and by label with .loc[]; and how to count distinct values with .unique(), .nunique(), and .value_counts(). Next chapter: filtering rows by condition.

Chapter Navigation

Move between chapters.

Previous: Chapter 2 — Exploring Your Dataset Next: Chapter 4 — Filtering Rows