Tuesday, February 23, 2016

Filtering and counting fields with awk

If you need to eliminate rows from a file that have a certain sum lets say 0, use awk. 

Create variable sum, let i go from 1 to NF, and add 0 or 1 to sum depending on the awk ternary operator (?:).  Add to sum 0 or 1, with sum +=, if $i (i.e if $i!=0) use 1 for sum+=, if false (i.e. if $i++0) use 0 for sum+=.

In case sum=0, the only time this would happen is if every field is 0, thus remove those fields with if (sum!=0) print.


awk '{sum=0; for (i=1; i<=NF; i++){sum += $i ? 1 : 0} if (sum!=0) print}'
To filter only those lines that have non 0 numbers repeated N times (for example 5) substitute 0 with 5 in if (sum!=0) print.

awk '{sum=0; for (i=1; i<=NF; i++){sum += $i ? 1 : 0} if (sum!=5) print}'
Or to filter lines that have 0 repeated 5 times.

awk '{sum=0; for (i=1; i<=NF; i++){sum += !$i ? 1 : 0} if (sum!=5) print}'

You can modify this code to count how many times in a row you have repeated 0, or any other number. To find out how many times 0 is in fields of each row:

awk '{sum=0; for (i=1; i<=NF; i++){sum += $i==0 ? 1 : 0} print sum}'
To find out how many times 5 is repeated in each row:

awk '{sum=0; for (i=1; i<=NF; i++){sum += $i==5 ? 1 : 0} print sum}'

No comments:

Post a Comment