Thursday, January 5, 2017
Parsing tsv output files from Kallisto
If you have a tsv file from Kallisto (abundance.tsv) that you need to parse,
for example, you need to take ENSG gene name from the second column delimited by | and then estimated number of counts from the third column delimited by tab, you can do this in one command. Use find/xargs pipe to process every abundance.tsv, cut -f1,4 columns using default tab as a separator (thus grabing first column and estimated number of counts from the 4th), then sed "s/.*|E/E/g" to delete text till first | (i.e the first column delimited with | ).
Next, use sed "s/|.*|.*|.*|.*|//g" to delete remaining of the text in the first column, keeping only ENSG gene name.
Finally, substitute .[0-9]* with //, i.e. delete.