regular expression (thing) by Glowin_Orb

Regular old search/replace not doing it for you? Starting out with Perl? Bored? Here, try this...

Quick and Dirty Regular Expression Guide

Basic

The most basic Regular Expression contains only the text you are looking for.

Example:

Bag

will match all occurrences of "Bag" in your document. Regular Expressions are by nature case sensitive, so this example will not match "bag", "bAg" or "baG".

^ -- Beginning of Line

The "^" character represents the beginning of a line (unless used in a Character Class, see below).

Example:

^Bag

will only match occurrences of "Bag" if they are located at the beginning of a line.

$ -- End of Line

The "$" character represents the end of a line.

Example:

Bag$

will only match occurrences of "Bag" if they are located at the end of a line.

You can combine Regular Expressions to make a larger Regular Expression.

Example:

^Bag$

will match everywhere "Bag" is the only thing on the line.

. -- Any Single Character

The "." character represents any single character.

Example:

B.g

will match things like "Bag", "Bog", "Bxg", "B:g", "B g", etc. It will not match "Baag" because there are two "a" characters between the "B" and the "g"; but:

B..g

Will match "Baag".

[ ] -- Character Class

A character class is used to define what the one character at that location can be by supplying a list of acceptable characters.

Example:

B[aiu]g

will only match "Bag", "Big", and "Bug". It will not match "Baug" because "au" takes up two character locations. There are three shorthand list notations that can be used inside a character class:

a-z All lower case letters
A-Z All upper case letters
0-9 All numerals

Example:

B[a-z0-9]g

will match "Big", "B5g", but not "BAg" because of the upper case "A".

Another feature of the character class is the "^" complement operator; if "^" is the first character in the list, the character class will match all characters NOT in the list.

Example:

B[^a-z]g

will not match any three letter words starting with "B", ending with "g", and having a lower case middle letter. It will however match "BAg", "B9g", "B g", and "B:g".

\ -- The "Escape" Character

The "\" character has a couple of uses. The first is interpreted as "take the next character literally".

Example:

To illustrate, let's say you're editing an *.ini file with a "Bug" section in it and need to match the "Bug" section header. If we just made a regular expression of:

[Bug]

the "[" and "]" would be interpreted as a character class causing the regular expression to look for any single character that is a "B", "u", or "g". Placing a "\" before the "[" and "]":

\[Bug\]

causes the regular expression to be interpreted as we would like; to look for a "[" followed by "B", "u", "g", and "]".

When followed by certain characters, the "\" and character pair have special interpretation:

\\ The "\" character
\n End of Line character
\t Tab character
\b Backspace character (Control-H)
\r Carriage return
\f Form feed

Also "\x" followed by a hexadecimal number can be used to represent any character.

Example:

\x0A

is a line feed character.

+, *, ? -- Iteration

These three operators ("+", "*", "?") are used to define the number of occurrences of the preceding expression. If an expression is followed by a "+", it will match one or more occurrences of that expression.

Example:

^.+$

will match any line containing at least one character. Likewise, an expression followed by a "*" will match zero or more occurrences of that expression. Therefore,

^.*$

will match any line whether it contains a character or not. Also, an expression followed by a "?" will match zero or one occurrences of that expression. So,

^.?$

will match any line that either contains one character or doesn't contain any characters.

These operators are most commonly used after Character Classes.

Example:

B[ai]?g

Which will only match "Bg", "Bag" and "Big".

( ) -- Grouping

Any portion of a regular expression surrounded by parenthesis ( "(" and ")" ) will be considered a group. This allows you to use items like "*", "+", "?", and "|" (discussed later in this document) on more than a single expression.

Example:

^(B[ai]g)?$

will match any blank line or any line containing only the word "Bag" or "Big". Another use of grouping would be the ability to use the matched group later on in a Search & Replace setting (described in the next section).

\n -- Group Reuse

Occasionally you might find the need to use the matched text from the search in your actual replacement string; \n allows you to do just that. A "\" followed by a number will put the group represented by that number into the location.

Example:

A "Find what:" statement of:

([a-zA-Z]+):([a-zA-Z]+)([^a-zA-Z])

and a "Replace with:" statement of:

\2:\1\3

will swap the location of any two words separated only by a colon.

| -- "Or" Operator

Any two expressions separated by a "|" will be interpreted as one and only one of the two expressions must match.

Example:

This prison serves ((bread)|(water))\.

will match lines describing very cruel prisons that only serve bread or only serve water! It will match the following two lines:

This prison serves bread.
This prison serves water.

But it will not match this line:

This prison serves breadwater.

Complete Example

As noted previously, Regular Expressions can be combined to make one big Regular Expression. Here is a complete example of a Regular Expression used to find all "#define" statements in *.c files:

^[ \t]*#[ \t]*define[ \t]*[a-zA-Z_][a-zA-Z0-9_]*[^a-zA-Z0-9_]

This expression looks for, at the beginning of the line:

zero or more tabs and/or spaces followed by a "#" character,
followed by zero or more tabs and/or spaces,
followed by the word "define",
followed by zero or more spaces and/or tabs,
followed by one character that can be any alphabetic character or an "_",
followed by zero or more characters that are alphanumeric or "_",
followed by a character that is not alphanumeric or "_". (whew)

regex	No rexen for the wildcard	World's most narrowly useful programming language	Mastering Regular Expressions
10 steps to becoming a Perl Ninja	animal book	Leaning Toothpick Syndrome	Perl
Kleene star	my first perl program	regular language	SED
*n?x	O'Reilly	regexp	Unicode Technical Report
grep	vi	s///	Comparing UNIX to DOS
The Jakarta Project	steps to UNIX familiarity	the key commands all emacs users should know	E2 node autolinker in perl