Regular Expression Examples[Overview]Regular Expression Examples
Regular Expression Metacharacters
Meatcharactes:
{}[]()^$.|*+? : Character with a special meaning. If you want to look for these characters
in your text, preceed them with a backslash \ (i.e. \$)
^ : Matches the starting position within the string. In line-based tools, it matches the
starting position of any line.
. : Matches any single character (many applications exclude newlines, and exactly which
characters are considered newlines is flavor-, character-encoding-, and platform-specific,
but it is safe to assume that the line feed character is included). Within POSIX bracket
expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc.,
but [a.c] matches only "a", ".", or "c".
[ ] : A bracket expression. Matches a single character that is contained within the brackets.
For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any
lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c",
"x", "y", or "z", as does [a-cx-z].
The - character is treated as a literal character if it is the last or the first (after the ^,
if present) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not
allowed. The ] character can be included in a bracket expression if it is the first (after the ^)
character: []abc].
[^ ]: Matches a single character that is not contained within the brackets. For example, [^abc]
matches any character other than "a", "b", or "c". [^a-z] matches any single character that
is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.
$ : Matches the ending position of the string or the position just before a string-ending newline.
In line-based tools, it matches the ending position of any line.
( ) : Defines a marked subexpression. The string matched within the parentheses can be recalled
later (see the next entry, \n). A marked subexpression is also called a block or capturing group. BRE mode requires \( \).
\n : Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This
construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than
nine capturing groups. Also known as a backreference.
Quantification:
? : The question mark indicates zero or one occurrences of the preceding element. For example,
colou?r matches both "color" and "colour".
* : The asterisk indicates zero or more occurrences of the preceding element. For example, ab*c
matches "ac", "abc", "abbc", "abbbc", and so on.
+ : The plus sign indicates one or more occurrences of the preceding element. For example, ab+c
matches "abc", "abbc", "abbbc", and so on, but not "ac".
{n} : The preceding item is matched exactly n times.
{min,} : The preceding item is matched min or more times.
{,max} : The preceding item is matched up to max times.
{min,max} : The preceding item is matched at least min times, but not more than max times.
Character classes;
[:digit:] : 0-9
[:alnum:] : A-Z,a-z,0-9
[:alpha:] : A-Z,a-z
[:blank:] :<space>, <tab>
[:punct:] : [][!"#$%&'()*+,./:;<=>?@\^_`{|}~-]
[:space:] : whitespace characters [ \t\r\n\v\f]
[:upper:] : A-Z
[:lower:] : a-Z
[:print:] : visible characters and <space>
\b : word boundary (zero width word boundary between a alphanumeric character and
a non alphanumeric character)
\w : alphanumeric including _
\s : whitepace character
\d : 0-9
\W : inverse of \w
\S : inverse of \s
\D : inverse of \dfor further information see the wiki page Regular expression
- print all lines which contain at least one number.
grep -E '[0-9]' ../data/moby-dick.txt
[Output]
- print all lines which does not contain a number.
grep -v -E '[0-9]' ../data/moby-dick.txt
[Output]
- print all lines which contain at least one 4-digit number.
grep -E '\b[0-9]{4}\b' ../data/moby-dick.txt
[Output]
- prints all lines that start with the string "Ahab" and also contain "whale" (case insensitive)
grep -iE '^Ahab.*whale' ../data/moby-dick.txt
[Output]
- return all lines that do not end with a dot or exclamation mark.
grep -E '[^.!]$' ../data/bbcsport/football/069.txt
[Output]
- return all lines that contain the same uppercase word twice (at least)
grep -E '\b([A-Z][a-z])\b.*\b\1\b' ../data/moby-dick.txt
[Output]
- replace all occurences of a single character D with GER
sed -E 's/\bD\b/GER/g' ../data/city.sample.csv
[Output]
- replaces all occurences of "captain ahab" (case insensitive) with "Ahab"
sed -E 's/captain ahab/Ahab/ig' ../data/moby-dick.txt
[Output]
- looks for dates and transforms them
echo 'My birthdate is the 09.09.1965. I was born, here in Karlsruhe' | sed -E 's/\b([0-9]{1,2})\.([0-9]{1,2})\.([0-9]{4})\b/\3-\2-\1/'
[Output]
- finds all numerals (incl. scientific notaion like -.4e-23) in all textfile of the current directory
grep -E '[+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?' *.txt
[Output]
all examples assembled by andreas schmidt for the IC3K 2022 conference