Consider this (very simplified) example string:
As you can see, it is two
digit/letter/letter/digit values separated by a comma.
Now, I could match this with the following:
>>> from re import match
>>> match("\d\w\w\d,\d\w\w\d", "1aw2,5cx7")
<_sre.SRE_Match object at 0x01749D40>
The problem is though, I have to write
\d\w\w\d twice. With small patterns, this isn’t so bad but, with more complex Regexes, writing the exact same thing twice makes the end pattern enormous and cumbersome to work with. It also seems redundant.
I tried using a named capture group:
>>> from re import match
>>> match("(?P<id>\d\w\w\d),(?P=id)", "1aw2,5cx7")
But it didn’t work because it was looking for two occurrences of
Is there any way to save part of a pattern, such as
\d\w\w\d, so it can be used latter on in the same pattern? In other words, can I reuse a sub-pattern in a pattern?
No, when using the standard library
re module, regular expression patterns cannot be ‘symbolized’.
You can always do so by re-using Python variables, of course:
digit_letter_letter_digit = r'\d\w\w\d'
then use string formatting to build the larger pattern:
or, using Python 3.6+ f-strings:
dlld = r'\d\w\w\d'
I often do use this technique to compose larger, more complex patterns from re-usable sub-patterns.
If you are prepared to install an external library, then the
regex project can solve this problem with a regex subroutine call. The syntax
(?<digit>) re-uses the pattern of an already used (implicitly numbered) capturing group:
| re-use pattern of capturing group 1
capturing group 1
You can do the same with named capturing groups, where
(?<groupname>...) is the named group
(?P>groupname) re-use the pattern matched by
groupname (the latter two forms are alternatives for compatibility with other engines).
regex supports the
(?(DEFINE)...) block to ‘define’ subroutine patterns without them actually matching anything at that stage. You can put multiple
(?<name>...) capturing groups in that construct to then later refer to them in the actual pattern:
^...............^ ^......^ ^......^
| \ /
creates 'dlld' pattern uses 'dlld' pattern twice
Just to be explicit: the standard library
re module does not support subroutine patterns.
Note: this will work with PyPi regex module, not with
You could use the notation
(?group-number), in your case:
it is equivalent to:
Be aware that
\d. The regex will be:
I was troubled with the same problem and wrote this snippet
For lack of a more descriptive name, I named the partial regexes as
Accessing them is as easy as
digit_letter_letter_digit = re.compile("\d\w\w\d") # we compile pattern so that we can reuse it later
all_finds = re.findall(digit_letter_letter_digit, "1aw2,5cx7") # finditer instead of findall
for value in all_finds:
Since you’re already using re, why not use string processing to manage the pattern repetition as well:
pattern = "P,P".replace("P",r"\d\w\w\d")
P = r"\d\w\w\d"
Try using back referencing, i believe it works something like below to match
You could use
See here for reference http://www.regular-expressions.info/backref.html