• Uncategorized

About bash : Join-a-line-ending-with-a-backslash-with-the-next-line-awk-from-Famous-awk-one-liners-explained

Question Detail

This exercise is from the AWK one-liners explained blog post by Peteris Krumins

Essentially this line

 awk '/\\$/ { sub(/\\$/,""); getline t; print $0 t; next }; 1'

joins every line ending with backslash with the next line:

e.g. input

12345\
6789
523435\
00000

Output

123456789
52343500000

The blog post says:
Unfortunately this one liner fails to join more than 2 lines (this is left as an exercise to the reader to come up with a one-liner that joins arbitrary number of lines that end with backslash :)).

So using the AWK one-liner above, and if you use an input file with 2 or more lines one after the other that has a backslash at the end (input2), gives an incorrect answer (output2)
e.g. input2

12345\
6789\
523435\
00000

Output 2 – INCORRECT

123456789\
52343500000

I think, according to the post, the output should instead be output3:

Output 3 – CORRECT

12345678952343500000

How can one solve this problem (input as input2 and getting output3)?

Question Answer

Try the following:

awk '/\\$/ { printf "%s", substr($0, 1, length($0)-1); next } 1' <<'EOF'
12345\
6789\
523435\
00000
EOF

which yields

12345678952343500000

This demonstrates that 3 consecutive (or more) line continuations work fine, unlike with the command in the question.

Explanation of the command:

  • /\\$/ matches a \ at the end ($) of a line, signaling line continuation.
  • substr($0, 1, length($0)-1) removes that trailing \ from the input line, $0.
  • By using printf "%s", the (modified) current line is printed without a trailing newline, which means that whatever print command comes next will directly append to it, effectively joining the current and the next line.
  • next finishes processing of the current line.
  • 1 is a common awk idiom that is shorthand for { print }, i.e., for simply printing the input line (with a trailing \n).

As for why the original command doesn’t work:

awk '/\\$/ { sub(/\\$/,""); getline t; print $0 t; next }; 1
  • On encountering a line-continuation character (\ at the end of the current line), getline t reads the next line from the file and prints it as is after the current line.
  • next then finishes processing of both the current and – thanks to the getline call – the next line, so that the next script cycle processes the line after the next line (2 lines from the current one).
  • Therefore, since the line read via getline is blindly printed and not examined in any way, it is skipped with respect to line-continuation-character processing.

In general, as Ed Morton points out in a comment, use of getline is rarely the right solution and can lead to subtle bugs – see http://awk.info/?tip/getline.

You may also like...

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.