Tuesday, January 22, 2013

How to select specific genes from whole gene lists using grep in Unix

If you have a big list of genes and want to select specific ones you can easily do it using the grep command, e.g.

grep "Fgf5" gene_exp.diff
will select for all the lines containg Fgf5 gene name

Now if you have a gene that belongs to a gene family, there may be a possibility where you will using this command pick up the genes you don't want. e.g.

grep "Adamts1" gene_exp.diff
will give as output Adamts1, but also Adamts10, Adamts11, Adamts12 lines etc.
In this case you may use this specific trick. If the next column in file contains the coordinates of the gene (e.g. chr17:33661049-33690727), as it is in the gene_exp.diff file made by Cuffdiff, do the following:

grep "Adamts1.c" gene_exp.diff
Here .c represents that one character is separating Adamts1 from the c character from the gene coordinates, which is only the case for Adamts1 lines, and not for Adamts10 lines etc. where there are two characters until c charachter.

No comments:

Post a Comment