Named capturing groups in regex

Regular expressions, commonly known as regex, are a super powerful tool that can be used for search, filtering, string manipulation and validation.

Even though there are some syntax variations, regex can be used with most software languages.

This article will focus on a particularly handy part of the regex syntax that can make your code more readable. If you are not yet familiar with the basics of how to write regex, I’d recommend starting from this free, interactive tutorial from RegexOne.

An example scenario

Let’s imagine we are on the log management tool dashboard and we want to analyse some server logs to gather information about a recent error and trigger warning notifications if necessary.

Here is an example of what that error message may look like:

Started GET "/blog" for ::1 at 2024-01-29 19:13:08 +0000

ActionController::RoutingError (No route matches [GET] "/blog"):

From it, we could extract the following information:

  • when the error occurred: 2024-01-29 19:13:08 +0000
  • the error type: RoutingError
  • the error message: No route matches [GET] "/blog"

The basic pattern

For the sake of this exercise, we’ll focus only on extracting the error type and message. I’ll leave it to you to work out how to get the timestamp, it’ll be good practice!

So, let’s get on with it. We want to extract the error type and message, which in the example above correspond to RoutingError and No route matches [GET] “/blog” respectively.

We can extract the error type with this pattern:

\w+Error

Which matches one or more word characters followed by Error.

We can extract the error message with this pattern:

\(.+\)

Which matches any character inside round brackets.

Notice here that the opening and closing brackets are preceded by a back slash. This indicates that the bracket signs, despite having a special meaning in the regex language, are to be taken literally in this case.

We can put the two patterns together, separated by a space, in any online regex sandbox to see the result. This is what it looks like in regex101.com:

Capturing groups

We now know the pattern works but a single match is returned and we want to have the error type and message returned separately.

To achieve this, all we need to do is surround the pattern we want to capture in round brackets. Any regex pattern surrounded by () becomes a capturing group.

This way, the error type pattern becomes:

(\w+Error)

And the message pattern becomes:

\((.+)\)

Notice that we have put the capturing group brackets () inside the literal brackets \( \). This is because we only want to capture the text.

Try it swapping the brackets the other way around to see the difference!

We can join the two patterns together again, separated by a space, and test the result. We now get one match and two captured groups (seen as Group 1 and Group 2 in the screenshot below):

If we were using this pattern within a script, we could use $1 and $2 to access the corresponding captured groups.

Named capturing groups

However, our code would be more understandable if, instead of referring to the captured groups as the default $1 and $2, we gave them a name that described what they represent.

We can do give a name to the capturing groups by following the syntax below:

(?<name>pattern)

Straight after the group’s opening bracket we add a ? character, followed by the group’s name inside < >. Only letters, numbers or the underscore sign are allowed in group names.

And so, if we wanted to name the first group as type, we would write:

(?<type>\w+Error)

To name the second group as message, we would write:

\((?<message>.+)\)

This what the resulting regex would look like from the sandbox:

The final regex pattern matches the error type and description as two separate capturing groups named type and message, respectively.

Notice that the captured groups are no longer called 1 and 2, but type and message. We could thus reference them as such in any further code.

It’s now much easier to understand what they represent, wouldn’t you agree?


Posted

in

by