The dot . matches any character.
Example: b.g matches big, beg, and bag, but not bp or baag.
If you use the "multi-line" version of the regular expression syntax, then the dot (.) character also matches new-lines. For example .* matches the whole file.
^ and $
The caret ^ matches the beginning of a line when the caret appears as the first character in the search pattern.
Example: ^Hello matches only if Hello appears at the beginning of a line.
The $ matches the end of a line.
Example: TRUE$ matches only if TRUE appears at the very end of a line.
\t matches a single tab character.
Example: \tint abc; matches a tab character followed by int abc;.
\s matches a single space character.
Example: \sif matches a space character followed by if.
\w matches a single white space character. In other words, \w matches either a tab or space character.
Example: \wwhile matches either a tab or space character, followed by while.
* and +
* matches zero or more occurrences of the preceding character. The fewest possible occurrences of a pattern will satisfy the match.
Example: a*b will match b, ab, aab, aaab, aaaab, and so on.
+ matches one or more occurrences of the preceding character.
Example: a+b will match ab, aab, aaab, aaaab, and so on, but not just b.
[ .. ]
When a list of characters are enclosed in square braces [..] then any character in that set will be matched.
Example: [abc] matches a, b, and c, but not d.
When a caret ^ appears at the beginning of the set, the match succeeds only if the character is not in the set.
Example: [^abc] matches d, e, or f, but not a, b, or c.
Sets can conveniently be described with a range. A range is specified by two characters separated by a dash, such as [a-z]. The beginning character must have a lower ASCII value than the ending character.
Example: [a-z] matches any character in the range a through z, but not A or 1 or 2.
Sets can contain multiple ranges.
Example 1: [a-zA-Z] matches any alphabetic character.
Example 2: [^a-zA-Z0-9] matches any non-alphanumeric character.
This matches a new-line, or line-break. Use this when you want to match an end-of-line within a larger pattern.
Example: dog\ncat matches dog, followed by a line break, followed by cat.
\( and \)
Parts of a regular expression can be isolated by enclosing them with \( and \), thereby forming a group. Groups are useful for extracting part of a match to be used in a replacement pattern. Each group in a pattern is assigned a number, starting with 1, from left to right.
Example: abc\(xyz\) matches abcxyz. xyz is considered group #1.
This is not all that useful, unless we are using the Replace command. The replace string can contain group characters in the form of \<number>. Each time a group character is encountered in the replacement pattern, it means "substitute the group value from the matched pattern".
Example 1: replace \(abc\)\(xyz\) with \2\1. This replaces the matched string abcxyz with the contents of group #2 xyz, followed by the contents of group #1 abc. So abcxyz is replaced with xyzabc. This is still not too amazing. See the next example.
Example 2: replace \(\w+\)\(.*\)ing with \1\2ed. This changes words ending in ing with the same word ending with ed. Your English teacher would not be too happy.
A backslash character \ preceding a meta-character overrides its special meaning. The backslash is ignored from the string.
Example: a\*b matches a*b literally. The * character does not mean "match 0 or more occurrences".