Monday, January 23, 2017

Replacing Excel's erroneous conversion of gene names

If you work in Excel with gene names, it is likely that your gene names have been substituted with dates (month-day).

For example, I am looking at the table created by a collaborator:

mpjanic@valkyr:~$ head mastertable.ip 
GENE ip_270_1 ip_270_2 ip_270_3 ip_270_4 ip_270_5 ip_270_6 ip_270_7 ip_270_8
SEC24B-AS1 2 1 4 3 8 5 4 3
A1BG 0 0 0 4 3 2 3 5
A1CF 0 0 0 0 0 0 0 0
GGACT 2 1 3 8 5 11 11 6
A2M 0 0 0 0 0 0 0 0
A2ML1 0 0 0 0 0 0 0 0
A2MP1 0 0 0 0 0 0 0 0
A4GALT 318 306 115 328 259 183 272 397
A4GNT 0 0 0 0 0 0 0 0
It is often good to check if the table has been generated in Excel:

mpjanic@valkyr:~$ grep Sep mastertable.ip 
Sep-15 1070 1220 778 1675 1675 1258 1133 2200
Sep-01 3 1 0 0 0 0 2 0
Sep-10 870 1065 798 1507 1322 839 1075 1469
Sep-11 3238 5737 3216 7827 5064 4548 3512 8151
Sep-12 0 0 0 0 0 0 0 0
Sep-14 0 0 0 0 0 0 0 0
Sep-02 2374 3801 2429 5225 4467 3692 3696 5812
Sep-03 1 1 0 2 1 1 1 1
Sep-04 0 0 0 0 0 0 0 0
Sep-05 198 282 147 338 291 243 290 410
Sep-06 332 434 334 743 462 432 417 629
Sep-08 686 784 502 1094 834 777 625 973
Sep-09 4238 4609 1840 4750 4047 3414 4967 5742
Sep-07 2374 3675 2426 4685 4358 3038 4676 6592
mpjanic@valkyr:~/mastertable$ grep Mar mastertable.ip 
Mar-01 0 1 3 3 3 0 0 4
Mar-02 76 76 52 107 64 57 72 129
Mar-01 0 0 0 0 0 0 0 1
Mar-10 0 1 0 0 0 2 1 0
Mar-11 0 0 0 1 0 0 0 0
Mar-02 224 191 153 220 201 120 313 323
Mar-03 236 223 239 351 313 230 439 421
Mar-04 162 193 126 255 233 157 198 254
Mar-05 239 343 214 469 411 246 360 591
Mar-06 1122 1828 809 2053 1965 1385 1768 2897
Mar-07 106 242 136 335 260 200 137 306
Mar-08 342 383 217 438 351 299 366 384
Mar-09 16 37 27 63 48 36 32 34
mpjanic@valkyr:~/mastertable$ grep Dec mastertable.ip 
Dec-01 0 0 0 0 0 0 0 0
There are 3 genes possibly affected by Excel, SEPT, MARCH and DEC, so for example SEPT1 will be converted to Sep-01, MARCH1 to Mar-01, and DEC1 to Dec-01.

To convert back use sed with regular expressions:

mpjanic@valkyr:~$ sed -E "s/(Sep-|Sep-0)/SEPT/g" mastertable.ip | grep SEPT
SEPT15 1070 1220 778 1675 1675 1258 1133 2200
SEPT1 3 1 0 0 0 0 2 0
SEPT10 870 1065 798 1507 1322 839 1075 1469
SEPT11 3238 5737 3216 7827 5064 4548 3512 8151
SEPT12 0 0 0 0 0 0 0 0
SEPT14 0 0 0 0 0 0 0 0
SEPT2 2374 3801 2429 5225 4467 3692 3696 5812
SEPT3 1 1 0 2 1 1 1 1
SEPT4 0 0 0 0 0 0 0 0
SEPT5 198 282 147 338 291 243 290 410
SEPT6 332 434 334 743 462 432 417 629
SEPT7P2 89 121 55 124 100 74 102 119
SEPT8 686 784 502 1094 834 777 625 973
SEPT9 4238 4609 1840 4750 4047 3414 4967 5742
SEPT7 2374 3675 2426 4685 4358 3038 4676 6592
SEPT7L 0 1 0 0 0 0 0 0
mpjanic@valkyr:~$ sed -E "s/(Mar-|Mar-0)/MARCH/g" mastertable.ip | grep MARCH
MARCH1 0 1 3 3 3 0 0 4
MARCH2 76 76 52 107 64 57 72 129
MARCH1 0 0 0 0 0 0 0 1
MARCH10 0 1 0 0 0 2 1 0
MARCH11 0 0 0 1 0 0 0 0
MARCH2 224 191 153 220 201 120 313 323
MARCH3 236 223 239 351 313 230 439 421
MARCH4 162 193 126 255 233 157 198 254
MARCH5 239 343 214 469 411 246 360 591
MARCH6 1122 1828 809 2053 1965 1385 1768 2897
MARCH7 106 242 136 335 260 200 137 306
MARCH8 342 383 217 438 351 299 366 384
MARCH9 16 37 27 63 48 36 32 34
mpjanic@valkyr:~$ sed -E "s/(Dec-|Dec-0)/DEC/g" mastertable.ip | grep DEC
DEC1 0 0 0 0 0 0 0 0

No comments:

Post a Comment