What is RegEx in Google Analytics?

Special characterMeaning
.matches any element
^matches beginning of sequence
$matches end of sequence
*zero or more repetitions
+one or more repetitions
?zero or one repetitions
{m}exactly m repetitions
{m,n}between m and n repetitions
|matches either regex before or after the symbol

What are regular expressions?

The strings you use within Google Analytics don’t just have to be a single static term. You can use specific symbols to create what are basically string formulas, which tell Google Analytics to analyze a broader scope of data depending on certain variables. These symbols are known as regular expressions, or regex for short. You can compare these to formulas in Microsoft Excel, and the symbols you use to create these. 

There are many different characters which have many different functionalities. Using these, you can make sure Google Analytics grabs data exactly the way you want it to. You can utilise this in many parts of Google Analytics including view filters, audiences, segments, goals, content groups and channel groupings. 

Let’s go over a simple example. Say you want to filter for only pages under the domain example.org/categories. Under here could be tons of pages, such as example.org/categories/shoes, example.org/categories/shirts, et cetera. You can create a filter with all these pages by using the string example.org/categories/.*. The asterisk tells Google Analytics that basically anything can be in its place that matches the character in front of it, and the dot is a substitute that may be filled in with any random character. This string thus is valid, and tells Google Analytics to pick up any URLs with anything in place of the .*. Thus, Google Analytics will pick up all pages under the categories subpage.

This just one example. There are tons of expressions available, and they can be combined too. This way, you can be certain you are getting exactly the results you want.

Which regular expressions are there? What do they do?

Using regex, you can create filters basically as complex as you want. We’ll run through some of the most commonly used characters within regex here. 

Dot – .

The dot basically means ‘anything goes’. If you put a dot within your filter string, you’re telling Google Analytics that any character could be in the place of that dot, and the filter would still apply.
For example, let’s say you were to use 1. as your string. Google Analytics will then pick up anything it sees that includes a 1 with another character next to it. 11, 1a, 12, 1b, et cetera. 

You can use multiple dots in a string as well. For example, if you were to type br..d, it will pick up any word that could be made by putting any letters on those two dots. Bread, breed, broad, brood, braid, all valid strings that will be captured with the given string. 

Backslash –

The backslash is an essential character when you’re applying filters. If you put a backslash in your string, you will tell Google Analytics to take the next character literally. 

Let’s use an IP address as an example. Say you want to filter for 8.8.8.8. If you just input 8.8.8.8 as your string. Google Analytics will then accept anything with any characters in place of the dots. 8181818, 8a8b8c8, 8q8g878, it doesn’t matter, it all goes. 

Backslashes can fix this though. If you use 8.8.8.8 as your string, Google Analytics will know to take the dots literally, and thus it will only filter for exactly 8.8.8.8 and nothing else.

Tip: this works for backslashes as well. Want to filter for something with a backslash in it? Type \ and Google Analytics will know to interpret the second backslash as a regular backslash.

Character classes

So we now know that backlash makes it so that the character next to it is interpreted literally by Google Analytics. However, there are a few exceptions to this. These are known as character classes. By putting certain letters behind a backslash, you can create character classes with unique functionalities. These can be useful when you need Google Analytics to look for a general group of characters. The following character classes are available within Google Analytics:

  • d : any digit should be in its place. If a number is in its place, it counts.
  • D : any non-digit character should be in its place. Any character that isn’t a number will do.
  • s : any whitespace character should be in its place. These include spaces, tabs, linebreaks, etc.
  • S : any non-whitespace character should be in its place.
  • w : any word character should be in its place. These include letters, numbers and underscores
  • W : any non-word character should be in its place. This is basically any character except for letters, numbers and underscores.

Question mark – ?

The question mark makes it so that the previous character must be matched zero or one time. In simpler terms, the last character before the question mark is optional. For example, if your string is 10?, then Google Analytics will accept both 1 and 10.

This is useful in many scenarios. An example is common misspellings. Say your last name is Johnsson, but a lot of people misspell it as Johnson. By making your string johnss?on, Google Analytics will look for and accept both. 

Plus – +

The plus kind of does the opposite of the question mark. If you include a plus, then the previous character must be matched one or more times. So if your string is 10+, then Google Analytics will accept 10, 100, 1000, 10000, 100000, et cetera. 

An example of when you can use this is when filtering IP addresses. Imagine you want to filter out all IP addresses that begin with 192.168.0. You could do that using the string 192.168.0.d+. The backslashes in front of the dots make sure the dots are interpreted as actual dots. The d means any random number can be there, and the plus makes it so this number can be as long as it needs to be. This way, all IP addresses starting with 192.168.0 are filtered.

Asterisk – *

The asterisk means that the characters in the place of it must match the character in front of it zero or more times. Basically, it combines the effects of the plus symbol and the question mark. That may sound a bit complicated, so let’s explain it with an example. Say you type in spoo*n as your string. Google Analytics will then match that with spon, spoon, spooon, spoooon, spooooon, et cetera. 

This works with numbers too. 12* will accept 1, 12, 122, 1222, 12222, 122222, et cetera.

There is a special use for the asterisk that deserves highlighting: the combination with the dot, or the .*. When you use .* in your string, you basically tell Google Analytics ‘anything goes here’. This is very useful in many scenarios.

Say for example that you have an eCommerce site, example.org. On your site there is the page example.org/categories. Under this page are many other subpages, like example.org/categories/televisions, example.org/categories/hifi, example.org/categories/gaming, et cetera. You can capture all of these by making your string example.org/categories/.*.

Square brackets – [ ]

Square brackets tell Google Analytics to match with any of the characters inside of the brackets. For example, say your string is LINK[1234]. Google Analytics will then accept and register LINK1, LINK2, LINK3 and LINK4. This can be useful when you have a handful of different options you all want to match. 

Hyphen – –

The hyphen, also known as the dash, is used to indicate sequences. Basically, by using a hyphen, you can tell Google Analytics to look for characters within a certain sequence. For example, say your string is [B-M]. Google Analytics will then look for capital letters from b to m in the alphabet. 

You can use this for regular letters, capital letters and digits, and you can start or end this at any moment in the sequence. Keep in mind: this works for digits, not numbers! If you use [1-20] as your string, Google Analytics will only look for 1 and 2. Tip: if you use the strings [a-z], [A-Z] or [0-9], Google Analytics will look for any letter, capital letter or number respectively. 

Pipe – |

The pipe, also known as the vertical line or divider line, basically stands for or. It serves basically the same purpose as typing OR in your search engine: you will tell Google Analytics to accept any of the options given. For example, if your string is cats|dogs, Google Analytics will accept both cats and dogs as a valid string to register. 

You can add multiple of these together as well. cats|dogs|bunnies|hamsters|parakeets will make Google Analytics look for one of those five terms.

Parentheses – ( )

Parentheses basically serve the same purpose in Google Analytics as they do in math. By putting a bit of the string in parentheses, you tell Google Analytics to work that bit of the string out before it does everything around it. That way, you can organise how Google Analytics handles your string and make sure that it doesn’t screw up the measurements by working things out wrong.

You can also use parentheses to combine multiple characters into a single part of the formula. Say you want to include ‘FM’ together as one element in the formula, simply put parentheses around it. 

For example, say you sell radios. Whether they are AM or FM is mentioned in the product label. Furthermore, all radios are labeled RA in their product label. After this comes a number. This means you can have tons of product labels like AMRA0001, FMRA8902, AMRA3589, FMRA0100, et cetera.

Using the pipe, you can include both the AM and FM version. You can put this into the formula properly using parentheses. This would lead to:

(AM|FM)RAd+

The d makes it so that any number can be in its place, and the plus makes it so that number can be of any length. The pipe makes it so that both AM and FM are picked up. To prevent the RA from being combined with the FM though, the parentheses are put around the AM and FM to keep that equation apart. 

Caret – ^

The caret can basically be read as ‘begins with’. If you include a caret in your string, you’re telling Google Analytics that the captured data should begin with what’s behind that caret. 

For example, say you include ^dog. Google Analytics will then accept dog, dogs, doggies, doggos, dog food, dog toys, dogs are adorable, and anything else that begins with ‘dog’. However, it will not accept ‘cute dog’, as that does not start with dog. As you can guess, this could be useful for sorting searches being done on your site.

Dollar sign – $

The dollar sign is basically the opposite of the caret. Putting a dollar sign behind something in your string will tell Google Analytics that the data must end on what’s in front of that dollar sign.

Let’s stick with the dog example. If your string is dog$, Google Analytics will accept dog, cute dog, big dog, small dog, catdog, food for dog, plush dog, et cetera. However, it will not accept dogs, as that does not end with ‘dog’. 

How do I use regular expressions?

Using regular expressions is kind of like building a mathematical formula in math class, or like building a formula in Excel or another spreadsheet program. Basically, you follow a few basic steps. First, think about what you need to accomplish with regular expressions. If you know that, go through the list and look for a regex that can do what you want to do. None that match? See whether there’s a combination of regexes that can accomplish it. Combining regexes to create an advanced formula is where the true power of regular expressions lies. 

If you’re struggling coming up with a proper formula, don’t worry. There are a lot of people who have made formulas and posted them online. If you can’t get to the formula you need yourself, why not try to look them up? There may very well be someone out there who already made it, and shared it with the world online. That would save you the work of having to do it yourself. Be careful though: check the formula carefully. There may be an error in there that could mess up your data, or it may not be applicable to your site. Be critical before you implement it.

Google Analytics RegEx cheat sheet

Need a quick handy guide to check what the different regex do? Here’s a quick handy cheat sheet!

RegexUseExample
.Any character can be in the dot’s place1.Valid data: 11, 12, 1a, 1b, etc.
Interpret the next character literally192.168.0.0The dots are interpreted as actual dots, not as a regex. 
?The character before the question mark is optionaljohnss?onValid data: johnson, johnsson
+The character in front of the plus may appear multiple times in a row10+Valid data: 10, 100, 1000, 10000, etc.
*The character in front of the asterisk is optional, but it may also appear multiple times in a row10*Valid data: 1, 10, 100, 1000, 100000, etc.
[ ]One of the characters specified between these brackets must be there[12abc]Valid data: 1, 2, a, b, c
Defining a sequence[b-h]Valid data: b, c, d, e, f, g, h
|One of the terms around the pipe must be therecats|dogs|dragonsValid data: cats, dogs, dragons
( )Isolate a part of the string, and execute it separately first(AM|FM)RAGoogle Analytics will first look for either AM or FM, and then for whether RA is behind it; both AMRA and FMRA are registered. 
^Data must begin with^dogValid data: dogs, doggies, dog toys, dog bed, etc.
$Data must end withdog$Valid data: catdog, cute dog, big dog, etc.
dAny digit may be hereValid data: any digit (0-9)
DAny non-digit may be hereValid data: any character except for a digit (0-9)
sAny whitespace character may be hereValid data: space, tab, linebreak, etc.
SAny non-whitespace character may be hereValid data: any character except for a whitespace character
wAny word character may be hereValid data: a-z, A-Z, 0-9 and underscore (_)
WAny non-word character may be hereValid data: any character except for a word character

Turn Google Analytics into conversions

Let smart algorithms audit your Google Analytics data. Find hidden conversion leaks and increase your conversions.

Create account