Regular expressions (regex or regexp) are a powerful tool for searching, replacing, and validating text using patterns. They're used in all programming languages, text editors, and command-line tools. While regex can look intimidating at first, the core concepts are straightforward. In this guide, we'll go from basics to advanced techniques.
What Are Regular Expressions
A regular expression is a sequence of characters that defines a search pattern. For example, \d{3}-\d{2}-\d{2} describes a phone number in the format "123-45-67". Instead of searching for specific text, regex lets you search by pattern β matching any text that fits the given format.
Regex is supported in all major programming languages: JavaScript, Python, PHP, Java, C#, Go, Ruby, and more. The syntax is mostly the same, though there are minor differences between implementations.
Basic Syntax
Literal Characters
The simplest regular expression is just text. The pattern hello will find the word "hello" in a string. Most characters match themselves directly.
Metacharacters
Special characters have unique meanings in regex:
.β any single character (except newline)^β start of string$β end of string\dβ any digit (0-9)\wβ any letter, digit, or underscore\sβ whitespace character (space, tab, newline)\bβ word boundary
Uppercase versions mean the opposite: \D β not a digit, \W β not a word character, \S β not whitespace.
Character Classes
Square brackets define a set of characters:
[abc]β character a, b, or c[a-z]β any lowercase Latin letter[A-Z0-9]β uppercase letter or digit[^abc]β any character except a, b, c
Quantifiers
Quantifiers specify how many times a pattern should repeat:
*β 0 or more times+β 1 or more times?β 0 or 1 time{3}β exactly 3 times{2,5}β 2 to 5 times{3,}β 3 or more times
Examples: \d+ β one or more digits, [a-z]{2,4} β 2 to 4 lowercase letters.
Groups and Alternation
Grouping
Parentheses () group parts of a pattern and capture the matched text:
(\d{2})-(\d{2})-(\d{4})β a date with three capture groups (day, month, year)(?:abc)β non-capturing group (for performance)
Alternation
The pipe | means "or":
cat|dogβ matches "cat" or "dog"(Mon|Tue|Wed|Thu|Fri)β any weekday
Common Patterns
Email Address
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
URL
https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/[^\s]*)?
Phone Number (US)
\+?1?[\s-]?\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}
IPv4 Address
\b(?:\d{1,3}\.){3}\d{1,3}\b
Date (MM/DD/YYYY)
(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/\d{4}
Lookahead and Lookbehind
Advanced constructs for checking context without including it in the match:
(?=...)β positive lookahead: text after must match the pattern(?!...)β negative lookahead: text after must NOT match(?<=...)β positive lookbehind: text before must match(? β negative lookbehind: text before must NOT match
Example: \d+(?= USD) β finds numbers before " USD" (100 USD β captures "100").
Flags (Modifiers)
iβ case-insensitive matchinggβ global search (find all matches, not just the first)mβ multiline mode (^and$work per line)sβ dot.also matches newline characters
Tips for Effective Usage
- Start simple β write a basic pattern first, then refine it
- Test with diverse data β check both positive and negative cases
- Avoid greediness β use
*?and+?for lazy quantifiers when needed - Comment complex patterns β use the
xflag for readable regex - Don't parse HTML with regex β use a DOM parser instead
Practice: Test Regex Online
The best way to learn regex is through practice. Try our regex tester tool β it highlights matches in real time and helps you understand how your pattern works against actual text.
Conclusion
Regular expressions are an indispensable tool for every developer. They enable efficient text searching, validation, and transformation. Start with simple patterns, gradually learn new constructs, and over time regex will become a powerful ally in your text processing tasks.