• Uncategorized

About awk : AWK-Compare-two-files-and-add-new-column-from-second-file-to-first-file-if-match

Question Detail

I have problem I would like to compare two files. First files is reference:

ABCA4 INHR
AMT   INHR
BTK   ONKO1
PAP   ONKO2

Second file is for compare:

3  1:2 T ENG1  ABCA4 ff
3  1:2 T ENG1  ABCA4 gg
5  1:4 A ENG20 AMT   ll
6  1:5 G ENG12 BRB   ds
7  1:6 T ENG8  PAP   rg 
7  1:6 T ENG8  PAP   tt

And I want compare $1 from first file with $5 from second file and if there is match print between $5 and $6 in second file the $2 from first file:

    3  1:2 T ENG1  ABCA4 INHR  ff
    3  1:2 T ENG1  ABCA4 INHR  gg
    5  1:4 A ENG20 AMT   INHR  ll
    6  1:5 G ENG12 BRB     -   ds
    7  1:6 T ENG8  PAP   ONKO2 rg 
    7  1:6 T ENG8  PAP   ONKO2 tt

all columns are tab separated.
Thanks

Question Answer

You can do that:

awk 'NR==FNR{a[$1]=$2;next}{$5=$5 "\t" (a[$5]?a[$5]:"-")}1' file1 file2

details:

NR==FNR {     # when the first file is processed
    a[$1]=$2  # store the second field in an array with the first field as key
    next      # jump to the next record
}
{
    $5=$5 "\t" (a[$5]?a[$5]:"-") # append a tab and the corresponding second
                                 # field from the first file if it exists or -
}
1 # true, display the line

Awk with the same logic. And the output is separated by tabs.

awk -v OFS="\t" 'FNR==NR{a[$1]=$2;next}{if (a[$5]) $5=$5"\t"a[$5]; else $5=$5"\t""-"}1' file1 file2

Output, tab-delimited:

3       1:2     T       ENG1    ABCA4   INHR    ff
3       1:2     T       ENG1    ABCA4   INHR    gg
5       1:4     A       ENG20   AMT     INHR    ll
6       1:5     G       ENG12   BRB     -       ds
7       1:6     T       ENG8    PAP     ONKO2   rg
7       1:6     T       ENG8    PAP     ONKO2   tt

You may also like...

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.