## Tuesday, February 23, 2016

### Filtering and counting fields with awk

If you need to eliminate rows from a file that have a certain sum lets say 0, use awk.

Create variable sum, let i go from 1 to NF, and add 0 or 1 to sum depending on the awk ternary operator (?:).  Add to sum 0 or 1, with sum +=, if \$i (i.e if \$i!=0) use 1 for sum+=, if false (i.e. if \$i++0) use 0 for sum+=.

In case sum=0, the only time this would happen is if every field is 0, thus remove those fields with if (sum!=0) print.

`awk '{sum=0; for (i=1; i<=NF; i++){sum += \$i ? 1 : 0} if (sum!=0) print}'`
To filter only those lines that have non 0 numbers repeated N times (for example 5) substitute 0 with 5 in if (sum!=0) print.

`awk '{sum=0; for (i=1; i<=NF; i++){sum += \$i ? 1 : 0} if (sum!=5) print}'`
Or to filter lines that have 0 repeated 5 times.

`awk '{sum=0; for (i=1; i<=NF; i++){sum += !\$i ? 1 : 0} if (sum!=5) print}'`

You can modify this code to count how many times in a row you have repeated 0, or any other number. To find out how many times 0 is in fields of each row:

`awk '{sum=0; for (i=1; i<=NF; i++){sum += \$i==0 ? 1 : 0} print sum}'`
To find out how many times 5 is repeated in each row:

`awk '{sum=0; for (i=1; i<=NF; i++){sum += \$i==5 ? 1 : 0} print sum}'`