Sunday, December 31, 2017

Calculate average for each row using awk

Lets say you have a file, and you want to calculate average for each row with the ID located in the first column, and want to keep ID and the new value.

head mastertable 
#header
LINC00969-220 0 0 0 0 0 0 0 0 0 0 0 0
LINC00969-221 0 0 1 0 0 0 0 0 13 14 22 15
LINC00969-222 0 0 0 0 0 0 0 0 0 0 2 0
AC063926.2-201 0 0 0 0 0 0 0 0 1 0 0 0
MEIS1-AS3-201 0 0 0 0 0 0 0 0 0 0 0 0
AC026992.2-201 0 0 0 0 0 0 0 0 0 0 1 0
LINC00969-223 0 0 0 0 0 0 0 0 0 0 0 0
AC026992.2-202 0 0 0 0 0 0 0 0 0 0 1 0
LINC00969-224 0 0 0 0 0 0 0 0 0 0 0 0


Use awk, For NR ==1 just print new header, then for each row start adding every value starting from $2 to the variable sum. At the end print $1 and sum/(NF-1):


awk 'NR == 1 { print "lncRNA", "Average"; next }    # Print a heading row\
NF > 2 { sum=0; for (i=2; i<=NF; i++) sum+=$i; print $1, sum/(NF-1) }' \ 
mastertable | sort -gr -k2 | head
RMRP-201 90047.4
AC007336.1-201 49627.7
AC026691.1-201 36003.8
AC034102.6-202 23611.6
AC116535.1-201 17051.9
LINC02022-201 15407.2
HELLPAR-201 14754.3
AC005089.1-201 12971.1
AC034102.7-201 12796.6
LINC00868-203 12116.7