Professional Documents
Culture Documents
[Reg]ular
[Ex]pressions
in 45 minutes or
less
Gabriel Barbu
16th of November 2019
1. What is a regular expression anyway?
2. Basics
a. Metacharacters
b. Character classes
c. Quantifiers
d. Negation
e. Alternations
f. Grouping
g. Flags
3. Uses and flavors
4. Tools & cheat sheet
5. Q&A
I am sure you are familiar with the term “wildcard” and used it at least once
If you work with Windows maybe you used the search from Windows Explorer
to find specific files by type such as:
- Documents: *.doc
- Text files: *.txt
- mp3 files: *.mp3
If you work with Linux (or Mac – not judging) maybe you used the search
for files into a specific folder, or some content inside a file:
- ls *.doc
- grep “CC*” CreditCardInformation.txt
If you work with databases, you definitely used wildcards in your queries suc
/^[2-9]\d{2}-\d{3}-\d{4}$/
o
r
/^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}$/g
o
r
Let's put that scary thing away and start understanding regular
expressions.
/search pattern/flags
What to search for Search flags
/search pattern/replacement/flags
- Meta characters
- Character classes
- Alternations
- Capturing groups
- Flags
Literal characters are any printable character from the ASCII table.
- [a-z\d] – will match all lowercase characters between a and z and all
the numeric characters
- [a-fz] – will match all lowercase characters between a and f and the
character z
- [-a-z\\.] – will match the character “-”, all the lowercase characters
A between a and z,
particularity of the character
character “\” and
classes the character
is that “.” the
we can write (notice the
double slash: as
metacharacters this is used classes.
character to escape the
The slash character)
benefit of character classes is
that we gain more control over the predefined characters in
metacharacters:
In this case we can use quantifiers to indicate how many things we want
to match.
The simple quantifiers defined in regular expressions are:
Each capture group gets an index starting from 1 and can be referenced
using backslash (\) and the index.
As we can see, we have a capture group for \d+ and the \1 (Group 1) has
captured “10” from our test string, and the \0 (Full match) has captured
“10 types” from the same test string.
In the following example we will see how to replace using capture groups:
Regular expressions support some flags (or modifiers) to change how the
match is done.
The flag(s) are added at the end of the regular expression, after the
closing
The mostslash /.flags in regular expression are:
common
- g : (global) don’t return after the first match, return all the
matches
- m : (multi line) changes the behavior of ^ to match the start of the
line and of $ to match the end of the line
- i : (insensitive) makes the search case insensitive
- s : (single line) changes the behavior of “.” to also match newline
- u : (ungreedy) makes all the quantifiers lazy
The following examples shows how the g flag changes how regular
expression
Without thereturn matches:
g flag:
- File renaming
- Text search (and replace)
- Web directives (Apache .htaccess)
- Database queries (MySQL)
- Input validation (and sanitization)
- Parsing log files
There are many flavors of regular expressions which influence what is
supported (and how is supported) from the generic syntax. Most of the
time all the simple stuff (which was presented here) work in all flavors.
- https://regex101.com/
- https://regexr.com/
- https://www.regextester.com/
Thank
You