Thursday, May 12, 2016

How to remove transcript version from GENCODE/ENSEMBL annotations

If you have downloaded GENCODE/ENSEMBL annotations and used it for example with featurecounts or HTSeq to create reads counts and now you need to convert back ENSEMBL IDs to gene symbols you can use my script ensemble2genename that will pull annotations from Biomart:

https://github.com/milospjanic/ensemble2genename

However to use this script on GENCODE annotations you will have to remove transcript version eg. ENSG00000223972.4 has to be converted to ENSG00000223972. To do this use this sed oneliner:


sed 's/\.[0-9]*//g' file.txt

1 comment:

  1. There are certainly a lot of details like that to take into consideration. That is a great point to bring up. I offer the thoughts above as general inspiration but clearly there are questions like the one you bring up where the most important thing will be working in honest good faith. I don?t know if best practices have emerged around things like that, but I am sure that your job is clearly identified as a fair game. Both boys and girls feel the impact of just a moment?s pleasure, for the rest of their lives.
    total commander download

    ReplyDelete