Regex Powerful Features

Text Searching

Text searching with regex allows you to locate specific patterns within a body of text. This can be useful for finding keywords, email addresses, phone numbers, or any other identifiable patterns.

Example 1: Find all email addresses in a text

import re

text = "Contact us at [email protected] or [email protected]." emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text) print(emails) # Output: ['[email protected]', '[email protected]']

Explanation: This regex pattern matches standard email addresses by ensuring the presence of alphanumeric characters and certain special characters before the "@" symbol and a valid domain name format.

Example 2: Search for specific keywords in a document

import re

document = "Python is a great programming language. Java is also popular." keywords = re.findall(r'\bPython\b|\bJava\b', document) print(keywords) # Output: ['Python', 'Java']

Explanation: This pattern matches the words "Python" and "Java" as whole words, ensuring that partial matches are not included.

Example 3: Find all occurrences of a date pattern

import re

text = "The event is on 2024-07-21. Another event is on 2023-11-15." dates = re.findall(r'\d{4}-\d{2}-\d{2}', text) print(dates) # Output: ['2024-07-21', '2023-11-15']

Explanation: This regex pattern matches dates in the format YYYY-MM-DD by looking for four digits, followed by a hyphen, two digits, another hyphen, and two more digits.

Text Validation

Regex can validate input formats to ensure they meet specified criteria, such as correct email, phone number, and date formats.

Example 4: Validate an email address

import re

email = "[email protected]" pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' is_valid = re.match(pattern, email) print(is_valid is not None) # Output: True

Explanation: This pattern ensures the email address contains valid characters and follows the standard email structure.

Example 5: Validate a phone number (US format)

import re

phone_number = "+1 (123) 456-7890" pattern = r'^(\+\d{1,2}\s?)?($\d{3}$|\d{3})[-.\s]?\d{3}[-.\s]?\d{4}$' is_valid = re.match(pattern, phone_number) print(is_valid is not None) # Output: True

Explanation: This pattern checks for various US phone number formats, including optional country code and different separators.

Text Extraction

Extracting specific data from text using regex is powerful for parsing logs, web scraping, and processing structured documents.

Example 6: Extract all URLs from a document

import re

document = "Check out https://example.com and http://domain.org for more info." urls = re.findall(r'https?://[^\s]+', document) print(urls) # Output: ['https://example.com', 'http://domain.org']

Explanation: This pattern matches URLs starting with "http" or "https" followed by "://", capturing everything until the next whitespace.

Example 7: Extract hashtags from a tweet

import re

tweet = "Loving the new features in #Python3 and #Django!" hashtags = re.findall(r'#\w+', tweet) print(hashtags) # Output: ['#Python3', '#Django']

Explanation: This pattern matches words that start with the "#" symbol and continue with alphanumeric characters.

Text Replacement

Regex is useful for replacing text patterns, enabling text formatting or data cleaning.

Example 8: Replace multiple spaces with a single space

import re

text = "This is an example." normalized_text = re.sub(r'\s+', ' ', text) print(normalized_text) # Output: "This is an example."

Explanation: This pattern matches one or more whitespace characters and replaces them with a single space.

Example 9: Anonymize email addresses in a document

import re

text = "Contact [email protected] or [email protected]." anonymized_text = re.sub(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', '[REDACTED]', text) print(anonymized_text) # Output: "Contact [REDACTED] or [REDACTED]."

Explanation: This pattern finds all email addresses and replaces them with "[REDACTED]".

Text Splitting

Splitting text into parts based on patterns is useful for tokenizing, parsing CSV data, and more.

Example 10: Split a CSV line into fields

import re

csv_line = "name,age,location" fields = re.split(r',', csv_line) print(fields) # Output: ['name', 'age', 'location']

Explanation: This pattern splits the text at each comma, returning a list of fields.

Example 11: Split a paragraph into sentences

import re

paragraph = "This is sentence one. This is sentence two! Is this sentence three?" sentences = re.split(r'[.!?]\s', paragraph) print(sentences) # Output: ['This is sentence one', 'This is sentence two', 'Is this sentence three?']

Explanation: This pattern splits the text at periods, exclamation marks, or question marks followed by a space.

By leveraging these regex patterns, you can efficiently search, validate, extract, replace, and split text in various applications, enhancing your text processing capabilities.