• Uncategorized

About regex : filter-all-words-containing-duplicate-characters

Question Detail

I need to filter all words out of a file that have duplicate characters.

Ive been stuck for a couple of days trying to figure this out.

This is what ive got so far to find a 5 letter string but ive still got words with duplicate letters showing up…

Any help would be appreciated

cat /file | grep -Eow '\w{5}' | grep -v '\(.\)(.\)\1' | sort -u

Question Answer

# Input file
$ cat file
aabc
123
1233

# Filter out repeating characters
$ grep -Ev "(.)\1" file
123

# Show only lines with repeating characters
$ grep -E "(.)\1" file
aabc
1233

This previous question explains how to do this using regex: Regex to find repeating numbers

Detailed explanations:

  • grep -E Use (extended) regex to match lines with grep
  • grep -v Invert match so unmatching rows are displayed
  • . regex match any character
  • ( ) regex group
  • \1 match the previous groups one time, in this case any character which repeats.

Also, https://regex101.com/ is an easy way to construct a regex for your purpose. Create a few test cases and check that it works as you write the regex.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.