• Uncategorized

About regex : Regex-lookahead-for-not-followed-by-in-grep

Question Detail

I am attempting to grep for all instances of Ui\. not followed by Line or even just the letter L

What is the proper way to write a regex for finding all instances of a particular string NOT followed by another string?

Using lookaheads

grep "Ui\.(?!L)" *
bash: !L: event not found

grep "Ui\.(?!(Line))" *

Question Answer

Negative lookahead, which is what you’re after, requires a more powerful tool than the standard grep. You need a PCRE-enabled grep.

If you have GNU grep, the current version supports options -P or --perl-regexp and you can then use the regex you wanted.

If you don’t have (a sufficiently recent version of) GNU grep, then consider getting ack.

The answer to part of your problem is here, and ack would behave the same way:
Ack & negative lookahead giving errors

You are using double-quotes for grep, which permits bash to “interpret ! as history expand command.”

You need to wrap your pattern in SINGLE-QUOTES:
grep 'Ui\.(?!L)' *

However, see @JonathanLeffler’s answer to address the issues with negative lookaheads in standard grep!

You probably cant perform standard negative lookaheads using grep, but usually you should be able to get equivalent behaviour using the “inverse” switch ‘-v’. Using that you can construct a regex for the complement of what you want to match and then pipe it through 2 greps.

For the regex in question you might do something like

grep 'Ui\.' * | grep -v 'Ui\.L'

If you need to use a regex implementation that doesn’t support negative lookaheads and you don’t mind matching extra character(s)*, then you can use negated character classes [^L], alternation |, and the end of string anchor $.

In your case grep 'Ui\.\([^L]\|$\)' * does the job.

  • Ui\. matches the string you’re interested in

  • \([^L]\|$\) matches any single character other than L or it matches the end of the line: [^L] or $.

If you want to exclude more than just one character, then you just need to throw more alternation and negation at it. To find a not followed by bc:

grep 'a\(\([^b]\|$\)\|\(b\([^c]\|$\)\)\)' *

Which is either (a followed by not b or followed by the end of the line: a then [^b] or $) or (a followed by b which is either followed by not c or is followed by the end of the line: a then b, then [^c] or $.

This kind of expression gets to be pretty unwieldy and error prone with even a short string. You could write something to generate the expressions for you, but it’d probably be easier to just use a regex implementation that supports negative lookaheads.

*If your implementation supports non-capturing groups then you can avoid capturing extra characters.

At least for the case of not wanting an ‘L’ character after the “Ui.” you don’t really need PCRE.

    grep -E 'Ui\.($|[^L])' *

Here I’ve made sure to match the special case of the “Ui.” at the end of the line.

If your grep doesn’t support -P or –perl-regexp, and you can install PCRE-enabled grep, e.g. “pcregrep”, than it won’t need any command-line options like GNU grep to accept Perl-compatible regular expressions, you just run

pcregrep "Ui\.(?!Line)"

You don’t need another nested group for “Line” as in your example “Ui.(?!(Line))” — the outer group is sufficient, like I’ve shown above.

Let me give you another example of looking negative assertions: when you have list of lines, returned by “ipset”, each line showing number of packets in a middle of the line, and you don’t need lines with zero packets, you just run:

ipset list | pcregrep "packets(?! 0 )"

If you like perl-compatible regular expressions and have perl but don’t have pcregrep or your grep doesn’t support –perl-regexp, you can you one-line perl scripts that work the same way like grep:

perl -e "while (<>) {if (/Ui\.(?!Lines)/){print;};}"

Perl accepts stdin the same way like grep, e.g.

ipset list | perl -e "while (<>) {if (/packets(?! 0 )/){print;};}"

You may also like...

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.