I need to filter all words out of a file that have duplicate characters.
Ive been stuck for a couple of days trying to figure this out.
This is what ive got so far to find a 5 letter string but ive still got words with duplicate letters showing up…
Any help would be appreciated
cat /file | grep -Eow '\w{5}' | grep -v '\(.\)(.\)\1' | sort -u
# Input file
$ cat file
aabc
123
1233
# Filter out repeating characters
$ grep -Ev "(.)\1" file
123
# Show only lines with repeating characters
$ grep -E "(.)\1" file
aabc
1233
This previous question explains how to do this using regex: Regex to find repeating numbers
Detailed explanations:
grep -E
Use (extended) regex to match lines with grep
grep -v
Invert match so unmatching rows are displayed
.
regex match any character
( )
regex group
\1
match the previous groups one time, in this case any character which repeats.
Also, https://regex101.com/ is an easy way to construct a regex for your purpose. Create a few test cases and check that it works as you write the regex.