Golden Codes - armanexplorer planet

Practical code snippets for Django, Python, Bash, Git and All!

View on GitHub

capturing and non-capturing groups

In Python's regular expressions, you can define non-capturing groups using the syntax (?:pattern). This syntax allows you to group a part of your regex pattern without capturing the matched text within that group.

Here's a breakdown of how non-capturing groups work:

Benefits of Non-Capturing Groups:

Here's an example to illustrate the difference:

import re

text = "This is a test string with an email example@example.com"

# Capturing group - captures the email address
pattern1 = r"(\w+@\w+\.\w+)"

# Non-capturing group - groups the domain part but doesn't capture it
pattern2 = r"This is a test string with an email (?:\w+@)(\w+\.\w+)"

match1 = re.search(pattern1, text)
match2 = re.search(pattern2, text)

# Accessing captured text (pattern1)
if match1:
  email = match1.group(1)
  print(f"Email (captured): {email}")

# No captured text for non-capturing group (pattern2)
if match2:
  print(f"Full Match (pattern2): {match2.group()}")

This code demonstrates that pattern1 captures the email address using a capturing group, while pattern2 uses a non-capturing group to group the domain part without capturing it.

greedy and non-greedy

By default, quantifiers in Python's regular expressions (*, +, ?, {m,n}) are greedy. This means they try to match as many characters as possible in the string to fulfill the quantifier's requirement.

Here's how greedy matching works with common quantifiers:

Here's an example to illustrate greedy matching:

import re

text = "abcbcacdefb"

# Greedy match - captures the entire string
pattern = r"a.*b"  # Matches 'a' followed by zero or more characters (greedy) then 'b'

match = re.search(pattern, text)

if match:
  print(f"Greedy Match: {match.group()}")

In this example, .* (zero or more characters) will greedily match the entire string "abcbcacdef" even though "abc" would also satisfy the pattern.

How to Control Greediness:

If you want to perform non-greedy matching (matching the fewest characters possible), you can add a question mark (?) after the quantifier. This modifies its behavior to be lazy:

Here's the same example modified for non-greedy matching:

pattern = r"a.*?b"  # Matches 'a' followed by zero or more characters (non-greedy) then 'b'

match = re.search(pattern, text)

if match:
  print(f"Non-Greedy Match: {match.group()}")

Now, the match will be "abc" because the non-greedy .*? stops matching characters as soon as it finds the first occurrence of 'b' that allows the entire pattern to match.

Remember: