Wednesday, May 4, 2016

Merging multiple files using first column as hash keys with awk - part 2


Continuing on the previous post, if one of the files has a missing value(s) even though they contain the correct element in first column, this may cause the output table to be in wrong order due to skipping of the values. Use this code to eliminate rows that do not contain values across all tested files:

NR==FNR { h1[$1] = $1; h2[$1] = $2; next }
NR!=FNR { l=FNR-1 }
NF{ if($1 in h1) h2[$1] = h2[$1] OFS $2;}
END { for(k in h2)
  if(split(h2[k], a) > ARGC-2)
    print k OFS h2[k]
}
Output without string due to missing value in 4.test:

DN52ei1m:~ milospjanic$ cat 1.test 
mail 5
now 7
tomorrow 7
string 5
do 6
DN52ei1m:~ milospjanic$ cat 2.test 
mail 4
now 4
tomorrow 4
string 4
do 4
DN52ei1m:~ milospjanic$ cat 3.test 
mail 6
now 6
tomorrow 7
string 5
do 67
DN52ei1m:~ milospjanic$ cat 4.test 
mail 89
now 75
tomorrow 75
string 
do 555
DN52ei1m:~ milospjanic$ awk -f script.awk 1.test 2.test 3.test 4.test 
tomorrow 7 4 7 75
do 6 4 67 555
now 7 4 6 75
mail 5 4 6 89

No comments:

Post a Comment