Part One
Selecting One or More Columns
A real-world grocery dataset might have 30 columns — store name, barcode, product description, brand, weight, price, discount, tax code, and more. You rarely want all of them at once. pandas lets you select just the columns you care about.
Single column — returns a Series
Multiple columns — returns a DataFrame
To select multiple columns, pass a list of column names inside the square brackets — this means two sets of brackets: the outer ones for the selection, and the inner ones for the Python list.
df["price"] returns a Series (one column). df[["price"]] returns a DataFrame with one column. The distinction matters when you pass the result to other functions that expect a specific type.
Part Two
.iloc[] — Select by Position
.iloc[] (short for integer location) lets you select rows and columns by their numeric position, like indexing a list. The syntax is df.iloc[row, column]. Positions start at 0.
.iloc[] uses Python slice notation: 0:3 means rows 0, 1, and 2 (not 3 — the end of a slice is exclusive). A negative index like -1 counts from the end, so df.iloc[-1] is always the last row regardless of how many rows the dataset has.
Part Three
.loc[] — Select by Label
.loc[] (short for label location) selects by the index label and column name. For our grocery dataset, the index is the default numeric one (0, 1, 2 …), so .loc[0] gives you row 0. But .loc[] really shines when the index is meaningful — for example, when each row is labelled with a date or a product code.
df.iloc[0:3] → rows at positions 0, 1, 2 (exclusive end)df.loc[0:3] → rows with labels 0, 1, 2, 3 (inclusive end)With a numeric default index they look similar, but on a date-indexed dataset the difference is critical.
Using item names as the index
A more natural way to use .loc[] is to set a meaningful column as the index using .set_index():
Part Four
.unique() and .value_counts()
Two essential methods for understanding a text column: how many distinct values are there, and how often does each appear?
.value_counts() returns the counts sorted from most to least frequent. It is one of the most useful tools in exploratory analysis — the equivalent of a quick frequency table. On a real supermarket dataset with thousands of products, it tells you at a glance which categories dominate the catalogue.
Part Five
Your Turn — Slice and Inspect
Work with the grocery DataFrame and answer the following using pandas selection tools:
- Select only the
item,price, andorganiccolumns. - Use
.iloc[]to show the first 5 rows of those columns. - How many unique units of measurement are there in the dataset?
[] and [[]]; how to select rows by position with .iloc[] and by label with .loc[]; and how to count distinct values with .unique(), .nunique(), and .value_counts(). Next chapter: filtering rows by condition.
Chapter Navigation
Move between chapters.