Regular old search/replace not
doing it for you? Starting out with Perl? Bored? Here, try this...
Quick and Dirty Regular
Expression Guide
Basic
The most basic Regular Expression contains only
the text you are looking for.
Example:
Bag
will match all occurrences of "Bag" in your document. Regular
Expressions are by nature case sensitive, so this example will not match
"bag", "bAg" or "baG".
^ -- Beginning of Line
The "^" character represents the beginning of a line (unless used
in a Character Class, see below).
Example:
^Bag
will only match occurrences of "Bag" if they are located at the
beginning of a line.
$ -- End of Line
The "$" character represents the end of a line.
Example:
Bag$
will only match occurrences of "Bag" if they are located at the
end of a line.
You can combine Regular Expressions to make a larger Regular Expression.
Example:
^Bag$
will match everywhere "Bag" is the only thing on the line.
. -- Any Single Character
The "." character represents any single character.
Example:
B.g
will match things like "Bag", "Bog", "Bxg", "B:g",
"B g", etc. It will not match "Baag" because there are two
"a" characters between the "B" and the "g"; but:
B..g
Will match "Baag".
[ ] -- Character Class
A character class is used to define what the one character at that location
can be by supplying a list of acceptable characters.
Example:
B[aiu]g
will only match "Bag", "Big", and "Bug". It will
not match "Baug" because "au" takes up two character
locations. There are three shorthand list notations that can be used inside
a character class:
a-z All lower case letters
A-Z All upper case letters
0-9 All numerals
Example:
B[a-z0-9]g
will match "Big", "B5g", but not "BAg" because of
the upper case "A".
Another feature of the character class is the "^" complement
operator; if "^" is the first character in the list, the character
class will match all characters NOT in the list.
Example:
B[^a-z]g
will not match any three letter words starting with "B", ending with
"g", and having a lower case middle letter. It will however match
"BAg", "B9g", "B g", and "B:g".
\ -- The "Escape" Character
The "\" character has a couple of uses. The first is interpreted as
"take the next character literally".
Example:
To illustrate, let's say you're editing an *.ini file with
a "Bug" section in it and need to match the "Bug"
section header. If we just made a regular expression of:
[Bug]
the "[" and "]" would be interpreted as a character class
causing the regular expression to look for any single character that is a
"B", "u", or "g". Placing a "\" before
the "[" and "]":
\[Bug\]
causes the regular expression to be interpreted as we would like; to look for a
"[" followed by "B", "u", "g", and
"]".
When followed by certain characters, the "\" and character pair have
special interpretation:
\\ The "\" character
\n End of Line character
\t Tab character
\b Backspace character (Control-H)
\r Carriage return
\f Form feed
Also "\x" followed by a hexadecimal number can be used to represent
any character.
Example:
\x0A
is a line feed character.
+, *, ? -- Iteration
These three operators ("+", "*", "?") are used
to define the number of occurrences of the preceding expression. If an
expression is followed by a "+", it will match one or more occurrences
of that expression.
Example:
^.+$
will match any line containing at least one character. Likewise, an
expression followed by a "*" will match zero or more occurrences
of that expression. Therefore,
^.*$
will match any line whether it contains a character or not. Also, an
expression followed by a "?" will match zero or one occurrences of
that expression. So,
^.?$
will match any line that either contains one character or doesn't contain any
characters.
These operators are most commonly used after Character Classes.
Example:
B[ai]?g
Which will only match "Bg", "Bag" and "Big".
( ) -- Grouping
Any portion of a regular expression surrounded by parenthesis (
"(" and ")" ) will be considered a group. This allows you
to use items like "*", "+", "?", and "|"
(discussed later in this document) on more than a single expression.
Example:
^(B[ai]g)?$
will match any blank line or any line containing only the word "Bag"
or "Big". Another use of grouping would be the ability to use the
matched group later on in a Search & Replace setting (described in the
next section).
\n -- Group Reuse
Occasionally you might find the need to use the matched text from the search
in your actual replacement string; \n allows you to do just that. A
"\" followed by a number will put the group represented by that
number into the location.
Example:
A "Find what:" statement of:
([a-zA-Z]+):([a-zA-Z]+)([^a-zA-Z])
and a "Replace with:" statement of:
\2:\1\3
will swap the location of any two words separated only by a colon.
| -- "Or" Operator
Any two expressions separated by a "|" will be interpreted as one
and only one of the two expressions must match.
Example:
This prison serves ((bread)|(water))\.
will match lines describing very cruel prisons that only serve bread or only
serve water! It will match the following two lines:
This prison serves bread.
This prison serves water.
But it will not match this line:
This prison serves breadwater.
Complete Example
As noted previously, Regular Expressions can be combined to make one big
Regular Expression. Here is a complete example of a Regular Expression used to
find all "#define" statements in *.c files:
^[ \t]*#[ \t]*define[ \t]*[a-zA-Z_][a-zA-Z0-9_]*[^a-zA-Z0-9_]
This expression looks for, at the beginning of the line:
zero or more tabs and/or spaces followed by a "#" character,
followed by zero or more tabs and/or spaces,
followed by the word "define",
followed by zero or more spaces and/or tabs,
followed by one character that can be any alphabetic character or an
"_",
followed by zero or more characters that are alphanumeric or "_",
followed by a character that is not alphanumeric or "_". (whew)