Thursday, May 12, 2016

How to remove transcript version from GENCODE/ENSEMBL annotations

If you have downloaded GENCODE/ENSEMBL annotations and used it for example with featurecounts or HTSeq to create reads counts and now you need to convert back ENSEMBL IDs to gene symbols you can use my script ensemble2genename that will pull annotations from Biomart:

https://github.com/milospjanic/ensemble2genename

However to use this script on GENCODE annotations you will have to remove transcript version eg. ENSG00000223972.4 has to be converted to ENSG00000223972. To do this use this sed oneliner:


sed 's/\.[0-9]*//g' file.txt

No comments:

Post a Comment