Monday, December 7, 2015

How to transform list of GWAS associated genes to single column list with sed command

If you have list of GWAS associated genes where each gene is in separate line or together with other genes from a particular locus separated with a comma, and if you want to transform this to a list of genes that are in a single column, use sed command to substitute comma and a space character with a new line character.


mpjanic@valkyr:~/REBUTTAL$ cat genes
ZNF259, APOA1, APOC3, APOA4, APOA5
UBE3B, MVK, MMAB, MYO1H, KCTD10
LIPC
CETP
GFOD2, LCAT
LIPG
APOB
ZNF259, APOA1, APOC3, APOA4, APOA5, BUD13
PCSK9
CELSR2
APOB
HMGCR
TRIB1
ZNF259, APOA1, APOC3, APOA4, APOA5, BUD13
LDLR
SF4, CILP2
APOC2, APOE, APOC4, APOC1
DOCK7, ANGPTL3
GCKR
TBL2, MLXIPL, BAZ1B, BCL7B
LPL
TRIB1
CILP2, ZNF101
PPP1R3B
AFF1
SELP, F5
LOC653163, SURF2, SURF4, ADAMTS13, C9orf7, ABO
RGS14, PRR7, DBN1, GRK6, UIMC1, SLC34A1, F12, FGFR4, NSD1, PRELID1, MXD3, LMAN2
F11
RFC4, ADIPOQ, KNG1
mpjanic@valkyr:~/REBUTTAL$ sed 's/,\ /\n/g' genes
ZNF259
APOA1
APOC3
APOA4
APOA5
UBE3B
MVK
MMAB
MYO1H
KCTD10
LIPC
CETP
GFOD2
LCAT
LIPG
APOB
ZNF259
APOA1
APOC3
APOA4
APOA5
BUD13
PCSK9
CELSR2
APOB
HMGCR
TRIB1
ZNF259
APOA1
APOC3
APOA4
APOA5
BUD13
LDLR
SF4
CILP2
APOC2
APOE
APOC4
APOC1
DOCK7
ANGPTL3
GCKR
TBL2
MLXIPL
BAZ1B
BCL7B
LPL
TRIB1
CILP2
ZNF101
PPP1R3B
AFF1
SELP
F5
LOC653163
SURF2
SURF4
ADAMTS13
C9orf7
ABO
RGS14
PRR7
DBN1
GRK6
UIMC1
SLC34A1
F12
FGFR4
NSD1
PRELID1
MXD3
LMAN2
F11
RFC4
ADIPOQ
KNG1

No comments:

Post a Comment