Book 4 — Data Analysis with Python

Python for All

Chapter Three — Selecting Data

Thanasis Troboukis  ·  All Books

Book Four · Chapter Three

Selecting Data

Knowing how to pull exactly the rows and columns you need is the most fundamental pandas skill. Here you learn to select by column name, by position, and by label.

Selecting One or More Columns

A real-world grocery dataset might have 30 columns — store name, barcode, product description, brand, weight, price, discount, tax code, and more. You rarely want all of them at once. pandas lets you select just the columns you care about.

Single column — returns a Series

Python · Try it

      

Multiple columns — returns a DataFrame

To select multiple columns, pass a list of column names inside the square brackets — this means two sets of brackets: the outer ones for the selection, and the inner ones for the Python list.

Python · Try it

      
Single vs double brackets: df["price"] returns a Series (one column). df[["price"]] returns a DataFrame with one column. The distinction matters when you pass the result to other functions that expect a specific type.

.iloc[] — Select by Position

.iloc[] (short for integer location) lets you select rows and columns by their numeric position, like indexing a list. The syntax is df.iloc[row, column]. Positions start at 0.

Python · Try it

      

.iloc[] uses Python slice notation: 0:3 means rows 0, 1, and 2 (not 3 — the end of a slice is exclusive). A negative index like -1 counts from the end, so df.iloc[-1] is always the last row regardless of how many rows the dataset has.

Python · Try it — every other row

      

.loc[] — Select by Label

.loc[] (short for label location) selects by the index label and column name. For our grocery dataset, the index is the default numeric one (0, 1, 2 …), so .loc[0] gives you row 0. But .loc[] really shines when the index is meaningful — for example, when each row is labelled with a date or a product code.

Python · Try it

      
Key difference — loc vs iloc:
df.iloc[0:3] → rows at positions 0, 1, 2 (exclusive end)
df.loc[0:3] → rows with labels 0, 1, 2, 3 (inclusive end)
With a numeric default index they look similar, but on a date-indexed dataset the difference is critical.

Using item names as the index

A more natural way to use .loc[] is to set a meaningful column as the index using .set_index():

Python · Try it

      

.unique() and .value_counts()

Two essential methods for understanding a text column: how many distinct values are there, and how often does each appear?

Python · Try it

      

.value_counts() returns the counts sorted from most to least frequent. It is one of the most useful tools in exploratory analysis — the equivalent of a quick frequency table. On a real supermarket dataset with thousands of products, it tells you at a glance which categories dominate the catalogue.

Your Turn — Slice and Inspect

Work with the grocery DataFrame and answer the following using pandas selection tools:

Python · Your turn

      
What you learned in this chapter: how to select one or more columns with [] and [[]]; how to select rows by position with .iloc[] and by label with .loc[]; and how to count distinct values with .unique(), .nunique(), and .value_counts(). Next chapter: filtering rows by condition.

Chapter Navigation

Move between chapters.

Loading Python environment — this may take a moment…