Wednesday, May 4, 2016

Merging multiple files using first column as hash keys with awk

Lets say you have multiple files with the same first column and you want to merge them. Use this awk code, note that this makes sense only if the elements of the first column are the same (order is not important) as only in that case the output table order is preserved.


NR==FNR { h1[$1] = $1; h2[$1] = $2; next }
NF{ if($1 in h1) h2[$1] = h2[$1] OFS $2;}
END { for(k in h2)
  if(split(h2[k], a) > 1)
    print k OFS h2[k]
}
Save as e.g. script.awk. Here is an example of merging 4 files into one composite:

DN52ei1m:~ milospjanic$ cat 1.test 
mail 5
now 7
tomorrow 7
string 5
do 6
DN52ei1m:~ milospjanic$ cat 2.test 
mail 4
now 4
tomorrow 4
string 4
do 4
DN52ei1m:~ milospjanic$ cat 3.test 
mail 6
now 6
tomorrow 7
string 5
do 67
DN52ei1m:~ milospjanic$ cat 4.test 
mail 89
now 75
tomorrow 75
string 5
do 555
DN52ei1m:~ milospjanic$ awk -f script.awk 1.test 2.test 3.test 4.test 
tomorrow 7 4 7 75
do 6 4 67 555
now 7 4 6 75
string 5 4 5 5
mail 5 4 6 89

No comments:

Post a Comment