Thursday, August 10, 2017

Extract columns by matching column names from file

If you want to select columns using column names that are read from another file use this code. For example, input file is input.txt and column names to be extracted are in columns.txt.

First save this awk code as, it will read the first row and iterate through the fields to find those that match the first argument $1. After it will save the matched row number as col_num and print it out. For other rows, NR>1, it will print out the field content only if i=col_num.

awk -v COLUMN_ID=$1 '
        NR==1 {
                for (i=1; i<=NF; i++) {
                        if ($i==COLUMN_ID) {
                                print $i;
        NR>1 {
                if (i=col_num) {
                        print $i;
' $2
Then run this code, it will read the columns.txt file line by line, save it in each loop into variable $line, then it runs the script with $line as $1 and input.txt as $2. It then saves output for each $line as a temp file. Then use paste to connect all .temp files to create the final table.

while IFS= read -r line; 
do ./ $line path/to/file/input.txt > $line.temp; 

done < path/to/file/columns.txt 

paste *.temp > output.txt