An Introduction to Programming

Python for All

Chapter Nine — Regex Basics

Thanasis Troboukis  ·  All Chapters

Chapter Nine

Regex Basics

A simple introduction to pattern matching in Python.

What Is Regex?

Regex (regular expressions) is a way to search text using patterns instead of exact full text.

Think of it as a smart filter. Instead of saying "find exactly this one sentence", you say "find anything that looks like a phone number" or "find all 4-digit years".

In Python, regex lives in the re module, so we start with import re.

Regex syntax, step by step

  1. Write a pattern as a raw string, for example r"\d{3}-\d{3}-\d{4}".
  2. Break it into pieces: \d{3} = exactly 3 digits, - = a real dash, \d{3} = 3 digits, - = dash, \d{4} = 4 digits.
  3. Read the whole pattern in plain English: "three digits, dash, three digits, dash, four digits."
  4. Match that pattern against text to find values like 210-123-4567.
Why the r before quotes? In regex, backslashes are common (like \d). The r"" form keeps patterns easier to write and read.

Pattern Pieces (Slow Build)

Start with small pieces and combine them:

\d a digit, \w a letter/number/underscore, \s a space, . any single character.

Quantifiers: + one or more, * zero or more, ? optional, {n} exactly n, {n,m} between n and m times.

Greek vs English letters (upper/lower)

When you need language-specific matching, use explicit character ranges:

Python · Try it

      
Python · Try it

      
Python · Try it

      

Regex Methods in re

Now that patterns make sense, use them with methods:

re.search() first match, re.findall() all matches, re.sub() replace matches.

What is a Match object? It is a result returned by regex when a match is found. Think of it as a small box with details about the match.

If you print the match object directly, Python shows something like: <re.Match object; span=(10, 14), match='2024'>. This is useful for debugging, but usually you want specific values from it.

Most common methods: match.group() (matched text), match.start() (start index), match.end() (end index).

What are start and end indexes? They are positions in the text. If the text is "Athens 2024, Patras", then the match "2024" starts at index 7 and ends at index 11. The end index is exclusive, so text[7:11] gives exactly "2024".

Python · Try it

      
Python · Try it

      
Python · Try it

      

Capturing Groups

Parentheses () create groups. Groups let you extract specific pieces of a match.

Group 1 is the first (), group 2 is the second, and so on.

Python · Try it

      
Tip: Regex gets easier when you go in this order: write a small pattern, test it, then add one more piece.

Pattern Pieces (More Advanced)

Now we add a few powerful pieces that you will use often in real text cleaning and validation.

^ start of string, $ end of string, | OR, [^abc] one character that is NOT in the set (a, b, c), \b word boundary, *? and +? lazy matching, \. literal dot.

Python · Try it

      
Python · Try it

      
Python · Try it

      
Rule of thumb: Use anchors (^ and $) for full validation, and use \b when you need whole words only.

Chapter Navigation

Move between chapters.

Loading Python environment — this may take a moment…