If you want to calculate average expression per each gene across all conditions in a mastertable use awk:
mpjanic@zoran:~$ head mastertable -n25 TQ6 TQ7 TQ8 TQ9 TQ10 TQ11 lnc-CCDC77-4:1 0 0 0 0 0 0 lnc-COX10-9:1 0 0 0 0 0 0 lnc-MAGEB2-1:1 0 0 0 0 0 0 lnc-TMEM99-2:1 0 0 0 0 0 0 lnc-COX10-9:2 0 0 0 0 0 0 DDN-AS1:2 0 0 0 0 0 0 lnc-TMEM99-2:2 1 0 0 0 0 0 lnc-SPRY4-3:1 0 0 0 0 0 0 DDN-AS1:3 0 0 0 0 0 0 lnc-TMEM99-2:3 0 0 0 0 0 0 DDN-AS1:4 28 32 10 31 2 13 DDN-AS1:5 0 0 0 0 0 0 lnc-ZNF516-4:10 0 0 0 0 0 0 DDN-AS1:6 0 0 0 0 0 0 lnc-ZNF516-4:11 0 0 0 0 0 0 GSEC:2 15 9 16 10 9 34 lnc-AATK-AS1-2:1 15 17 2 19 28 24 lnc-PLCH1-5:1 0 0 0 0 0 0 GSEC:3 0 0 0 0 0 0 lnc-MFSD9-7:1 53 27 22 41 47 41 lnc-PSMC1-1:1 29 30 54 18 23 58 GSEC:4 0 0 0 0 0 0 GSEC:5 0 0 0 0 0 0 lnc-ZNF780B-1:1 124 125 80 123 110 148
Use awk to print the first line, then for each field starting from NF>2 assuming the first field is the gene name, perform {sum=0; for (i=2; i<=NF; i++) sum+=$i; print $1, sum/(NF-1)}.
Then, sort -gr -k2, to sort in reverse order and with -g option (--general-numeric-sort):
Then save it in a file expression_average_per_gene:
mpjanic@zoran:~$ awk 'NR == 1 { print "lncRNA", "Average"; next } # Print a heading row\ > NF > 2 { sum=0; for (i=2; i<=NF; i++) sum+=$i; print $1, sum/(NF-1) }' mastertable | sort -gr -k2| head -n 20 lnc-SGCE-3:1 157963 lnc-EIF2AK4-6:1 120530 lnc-SLC3A2-6:1 110467 lnc-ATIC-14:1 66120.8 lnc-TRIM69-3:1 59894.5 lnc-TRDMT1-5:2 52209.8 lnc-ANKRD55-6:1 44934.3 lnc-LRRTM4-6:1 44869.3 lnc-LYN-8:1 39859.2 lnc-VGF-4:1 37230.2 lnc-VGF-3:1 32908.2 lnc-VAT1-4:1 27177.7 lnc-SH3D19-2:1 22266.8 IGFBP7-AS1:16 21963.5 lnc-HSD17B7-1:2 21429.5 lnc-BTD-2:1 21348.2 lnc-ARID2-11:1 21063.7 lnc-CBY3-3:2 19925.3 lnc-DYNC2H1-4:1 19492.8 lnc-C6orf120-1:7 18986.2
mpjanic@zoran:~$ awk 'NR == 1 { print "lncRNA", "Average"; next } # Print a heading row\ NF > 2 { sum=0; for (i=2; i<=NF; i++) sum+=$i; print $1, sum/(NF-1) }' mastertable | sort -gr -k2| head -n 20
No comments:
Post a Comment