Part One
What Is Regex?
Regex (regular expressions) is a way to search text using patterns instead of exact full text.
Think of it as a smart filter. Instead of saying "find exactly this one sentence", you say "find anything that looks like a phone number" or "find all 4-digit years".
In Python, regex lives in the re module, so we start with import re.
Regex syntax, step by step
- Write a pattern as a raw string, for example
r"\d{3}-\d{3}-\d{4}". - Break it into pieces:
\d{3}= exactly 3 digits,-= a real dash,\d{3}= 3 digits,-= dash,\d{4}= 4 digits. - Read the whole pattern in plain English: "three digits, dash, three digits, dash, four digits."
- Match that pattern against text to find values like
210-123-4567.
r before quotes? In regex, backslashes are common (like \d). The r"" form keeps patterns easier to write and read.
Part Two
Pattern Pieces (Slow Build)
Start with small pieces and combine them:
\d a digit, \w a letter/number/underscore, \s a space, . any single character.
Quantifiers: + one or more, * zero or more, ? optional, {n} exactly n, {n,m} between n and m times.
\d: one digit. Example patternr"\d"matches"7"in"Room 7".\w: letter/number/underscore. Example patternr"\w+"matches words like"Athens2026".\s: whitespace. Example patternr"\s"matches the space in"Hello World"..: any single character. Example patternr"h.t"matches"hat","hit","hot".+: one or more. Example patternr"\d+"matches"305"as one block.*: zero or more. Example patternr"ab*"matches"a","ab","abbb".?: optional (zero or one). Example patternr"colou?r"matches both"color"and"colour".{n}: exactly n times. Example patternr"\d{4}"matches years like"2026".{n,m}: from n to m times. Example patternr"\d{2,4}"matches"12","305","2026".
Greek vs English letters (upper/lower)
When you need language-specific matching, use explicit character ranges:
- English uppercase:
r"[A-Z]+" - English lowercase:
r"[a-z]+" - English any case:
r"[A-Za-z]+" - Greek uppercase:
r"[Α-ΩΆΈΉΊΌΎΏΪΫ]+" - Greek lowercase:
r"[α-ωάέήίόύώϊϋΐΰς]+" - Greek any case:
r"[Α-ΩΆΈΉΊΌΎΏΪΫα-ωάέήίόύώϊϋΐΰς]+"
Part Three
Regex Methods in re
Now that patterns make sense, use them with methods:
re.search() first match, re.findall() all matches, re.sub() replace matches.
re.search(pattern, text)returns aMatchobject for the first match, orNoneif nothing matches.re.findall(pattern, text)returns alistof all matched strings (or an empty list if no matches).re.sub(pattern, replacement, text)returns anew stringwith replacements applied.re.fullmatch(pattern, text)returns aMatchobject only if the entire text matches, otherwiseNone.
What is a Match object? It is a result returned by regex when a match is found. Think of it as a small box with details about the match.
If you print the match object directly, Python shows something like: <re.Match object; span=(10, 14), match='2024'>. This is useful for debugging, but usually you want specific values from it.
Most common methods: match.group() (matched text), match.start() (start index), match.end() (end index).
What are start and end indexes? They are positions in the text. If the text is "Athens 2024, Patras", then the match "2024" starts at index 7 and ends at index 11. The end index is exclusive, so text[7:11] gives exactly "2024".
Part Four
Capturing Groups
Parentheses () create groups. Groups let you extract specific pieces of a match.
Group 1 is the first (), group 2 is the second, and so on.
Part Five
Pattern Pieces (More Advanced)
Now we add a few powerful pieces that you will use often in real text cleaning and validation.
^ start of string, $ end of string, | OR, [^abc] one character that is NOT in the set (a, b, c), \b word boundary, *? and +? lazy matching, \. literal dot.
^: start of text. Example patternr"^A"matches names that start with A, like"Alex".$: end of text. Example patternr"ing$"matches words ending ining, like"learning".|: OR between options. Example patternr"cat|dog"matches either"cat"or"dog".[^...]: one character not in a set. Example patternr"[^0-9]+"matches non-numeric parts in"Room 205B"like"Room "and"B".\b: whole-word boundary. Example patternr"\\bcat\\b"matches"cat"but not thecatinside"scatter".*?/+?: lazy matching (shortest possible). Example patternr"<.*?>"matches each tag in"<b>bold</b><i>it</i>"one by one.\.: escaped dot for a real period. Example patternr"python\\.org"matches"python.org"exactly.
^ and $) for full validation, and use \b when you need whole words only.
Chapter Navigation
Move between chapters.