Wednesday, March 8, 2017
Collapsing transcript expression levels into gene levels by sum, maximum and average - awk code
In case you have generated a file containing gene, transcript (isoform), expression level,
Use this awk code to collapse this file and sum on the gene level. Sum all $3 in hash with $1 as a key, then loop through the array and print keys and values.
To collapse the isoform file and select the top isoform file use this awk code. Use hash max with the key $1 and value $3. If $3 is greater than max change max to new $3 value and put $3 in array. Next, loop through the array and print keys and values.
To plot average use the following code. Create array hash with $1 as keys that sums $3 values, and 'no' hash with $1 as key that increases by 1. Loop through the array and print keys and division of hash 'array' and 'no', which is a sum divided by number of transcripts.