Wednesday, June 7, 2017

Awk code to subtract list of housekeeping genes from the mastertable

If you have a list of housekeeping genes that you want to subtract from the mastertable (for example if you don't want to include housekeeping genes in DE analysis), use the awk code bellow to place the gene list into a hash h[$1], then use it to subtract with if(!h[$1]):


head housekeeping.genes.conversion.to.mouse.gene.names.ens
ENSMUSG00000039952
ENSMUSG00000060288
ENSMUSG00000022400
ENSMUSG00000050552
ENSMUSG00000034854
ENSMUSG00000029621
ENSMUSG00000000399
ENSMUSG00000060992
ENSMUSG00000021076
ENSMUSG00000031701
head mastertable.RPKM
  control_macs ribo_heart1 ribo_heart2 ribo_kidney1 ribo_kidney2 ribo_liver1 ribo_liver2 tcf21cre_macs
ENSMUSG00000053783 0 0 0 0 0 0 0 0
ENSMUSG00000078607 14.3315290267182 5.27148998470254 9.47657223005189 14.1782105971938 25.9570809377262 12.5352263065328 3.49215569538464 8.12296582986335
ENSMUSG00000021900 65.6250210929769 22.948792387045 60.228922019752 38.7788513986779 52.1120026703478 57.297708533105 19.665464723166 62.9006962103502
ENSMUSG00000021901 63.5325181369401 29.4644784214023 66.5475638175278 26.4470153993863 29.9656600887317 19.5248137747344 9.16367902027884 29.6891300712868
ENSMUSG00000081820 0 0 0 0 0 0 0.111648246991865 0
ENSMUSG00000021902 5.34723506150612 5.33759154872005 9.07451378908043 8.48829107065946 9.13791262030437 7.06216010001703 2.64534314243628 3.86463200913539
ENSMUSG00000021903 15.8545088541307 9.74915150330365 22.1922681668968 0.63286285488678 0.530720326421349 3.22314506289442 3.27454460225776 11.8394474075886
ENSMUSG00000081821 0 0 0 0 0 0 0 0
ENSMUSG00000019710 26.4100432213795 21.5133687933026 30.3889090935622 30.3215943263374 28.0822793064558 19.990967660423 28.4085170598329 24.6477111636308
awk 'NR==FNR {h[$1] = $1; next} {if(!h[$1]) print $0}' housekeeping.genes.conversion.to.mouse.gene.names.ens mastertable.RPKM | head
  control_macs ribo_heart1 ribo_heart2 ribo_kidney1 ribo_kidney2 ribo_liver1 ribo_liver2 tcf21cre_macs
ENSMUSG00000053783 0 0 0 0 0 0 0 0
ENSMUSG00000078607 14.3315290267182 5.27148998470254 9.47657223005189 14.1782105971938 25.9570809377262 12.5352263065328 3.49215569538464 8.12296582986335
ENSMUSG00000081820 0 0 0 0 0 0 0.111648246991865 0
ENSMUSG00000021902 5.34723506150612 5.33759154872005 9.07451378908043 8.48829107065946 9.13791262030437 7.06216010001703 2.64534314243628 3.86463200913539
ENSMUSG00000021903 15.8545088541307 9.74915150330365 22.1922681668968 0.63286285488678 0.530720326421349 3.22314506289442 3.27454460225776 11.8394474075886
ENSMUSG00000081821 0 0 0 0 0 0 0 0
ENSMUSG00000080500 0 0 0 0 0 0 0 0
ENSMUSG00000021904 3.74710427285173 1.01565097626204 69.4171627480469 13.6072031971553 38.4037682639844 0.161290198636122 0.0409393307612302 0.0734611101052428
ENSMUSG00000081822 0.173365270484494 2.52854077807611 0 0 0 0 0.107017674535981 0

2 comments:

  1. Thank you for your help. Please do keep posting solutions to other common questions. Also share research projects, if you’re working on any. It would be interesting to read.

    ReplyDelete
  2. Hey, this was helpful mate and I'd like to appreciate you for that! Good job with this post and thanks for being so resourceful. I'll show this to my team

    ReplyDelete