Stefan's Tools

Useful open source tools that make your life easier

Regular expression help and examples for grepWin

grepWin uses the boost regex engine to do its work, with the Perl Regular Expression Syntax.

Introduction

I'll only explain the very basics on how to use regular expressions and some special variables you can use in grepWin that aren't part of the official regular expression syntax.

For a much more detailed tutorial on regular expressions, please go to this site - it also explains a lot on how regex engines work internally.

search basics

. (dot)
a dot matches any character. Searching for t.t will match tat as well as tut.
+
matches the previous expression one or more times, but at least once. Searching for spel+ing will find all words like speling or spelling but not speing since the l must be matched at least once.
*
matches the previous expression zero or more times. Searching for spel*ing will find all words like speling or spelling and also speing since the l can be matched zero times, which means it doesn't have to be there.
\
the backslash escapes special characters that would otherwise be treated specially. Searching for a double dot in your text with .. would not work since the dot matches any character. To search for a double dot you have to escape the dot chars like this: \.\..
\Q..\E
in case you need to search for a literal string that has a lot of special characters in it, you can use the \Q..\E sequence. Searching for *.* would match everything unless you escape every single char like this: \*\.\*. For such search strings it's easier to just put them inside the \Q..\E sequence like this: \Q*.*\E.
[]
With square brackets you can specify so called character classes. Such a class matches all chars that are specified between the brackets. Searching for [-+0-9]+ will find any string that contains the chars '-', '+' and all chars between 0 and 9, but no other chars. It will match -123, +123 or 123, but not testword. There are a few default character classes defined so you don't have to create one yourself. You can find a list of those classes here. The most used ones are \d which matches all digits, \w which matches all word chars and \s which matches all whitespace chars.
^, $
the caret matches the beginning of a line, and the string char $ matches the end of a line. Searching for ^title$ will only find lines that only consist of the word title, but no places where the word title is inside a line. Searching for ^// will find all lines that start with two slashes, but not lines where two slashes are not at the very beginning of a line. Searching for goodbye\.$ will find lines that end with goodbye., but not if goodbye. is somewhere inside a line.
\b
\b matches word boundaries. Searching for \bword\b finds word, but not subwords or words.
()
parenthesis pairs define a group. Grouping is useful for more advanced regex searching, but also for use when replacing text. Each group that matches part of the full matching string can be referenced later in the replace string.
|
The | char is used as an OR operator. Searching for cat|dog will match either cat or dog. Note that the OR operator uses everything left and right of the operator. If you want to limit the reach of the operator, you have to use brackets to group them. Searching for (cat|dog)food finds catfood and dogfood.

replacing

replacing strings is not more complicated than searching. Whatever the search finds is replaced with the replace string. Searching for cat and replacing it with dog is the most basic example and works just like you'd expect.

in replace strings, you can also use references.

$1..$9
in replace strings, you can also refer to matched groups from the search string. Groups are referred to with $1..$9. For example, if you search for (cats) and (dogs) and replace it with $2 and $1, the string cats and dogs gets replaced with dogs and cats. $1 refers to the first matching group, which is cats, and $2 refers to the second matching group, which is dogs.
${filepath}, ${filename}, ${fileext}
the ${filepath} reference gets replaced with the full path of the current file. ${filename} gets replaced with the filename without the file extension, and ${fileext} gets replaced with the file extension of the current file. This is special to grepWin.
${count0N}, ${count0N(AA)}, ${count0N(AA,BB)}
grepWin also offers a special replace reference for counting. ${count0N} is replaced with numbers starting from 1 and incremented by 1. The 0 and N are optional and used for formatting the number. The N is a number that specifies how many chars the number should use. The number is then padded with spaces to fill the space. If 0 is specified, the number is padded with leading zeros. You can also specify the start count using the AA number, and the increment values using the BB number for the counting.

replacing examples

insert line numbers at the start of each line

Search string: ^

Replace string: ${count04}

Results in:
0001 line 1
0002 line 2
0003 line 3