Friday, January 12, 2018

Code to plot GTEx color coded expression levels

These are the instructions to plot GTEx color coded expression levels in R and ggplot2. You need to have two files, one with GTEx tissue names and expression values,

> RPKM<-read.table("RPKM.txt", head=TRUE,stringsAsFactors=FALSE,sep="\t")

> RPKM
                                 GTEx_tissue       RPKM
1                     Adipose - Subcutaneous  66.859651
2               Adipose - Visceral (Omentum)  24.028442
3                              Adrenal Gland 116.200920
4                             Artery - Aorta   2.133841
5                          Artery - Coronary  67.965125
6                            Artery - Tibial  86.771264
7                                    Bladder  52.417672
8                           Brain - Amygdala  48.820335
9   Brain - Anterior cingulate cortex (BA24)  51.514548
10           Brain - Caudate (basal ganglia)   1.032300
...
And a file with GTEx tissues and color code.

> COLOR<-read.table("COLOR.txt", head=TRUE,stringsAsFactors=FALSE,sep="\t")
> COLOR
                          tissue_site_detail tissue_color_hex
1                     Adipose - Subcutaneous           FF6600
2               Adipose - Visceral (Omentum)           FFAA00
3                              Adrenal Gland           33DD33
4                             Artery - Aorta           FF5555
5                          Artery - Coronary           FFAA99
6                            Artery - Tibial           FF0000
7                                    Bladder           AA0000
8                           Brain - Amygdala           EEEE00
9   Brain - Anterior cingulate cortex (BA24)           EEEE00
10           Brain - Caudate (basal ganglia)           EEEE00
...
First lets order the RPKM data frame.

> RPKM<-RPKM[order(RPKM$RPKM),]
> RPKM
                                 GTEx_tissue       RPKM
10           Brain - Caudate (basal ganglia)   1.032300
53                               Whole Blood   1.249288
26                           Colon - Sigmoid   1.515215
4                             Artery - Aorta   2.133841
18           Brain - Putamen (basal ganglia)   5.461759
32                  Heart - Atrial Appendage   6.369740
42                                 Pituitary   6.862820
39                            Nerve - Tibial   9.316173
35                                     Liver  11.389639
29                        Esophagus - Mucosa  12.715260
25                       Cervix - Endocervix  21.005066
...
Next, merge RPKM and COLOR data frames while preserving the original order.

> DATA<-merge(RPKM, COLOR, by.x="GTEx_tissue", by.y="tissue_site_detail", sort=F)
> DATA
                                 GTEx_tissue       RPKM tissue_color_hex
1            Brain - Caudate (basal ganglia)   1.032300           EEEE00
2                                Whole Blood   1.249288           FF00BB
3                            Colon - Sigmoid   1.515215           EEBB77
4                             Artery - Aorta   2.133841           FF5555
5            Brain - Putamen (basal ganglia)   5.461759           EEEE00
6                   Heart - Atrial Appendage   6.369740           9900FF
7                                  Pituitary   6.862820           AAFF99
8                             Nerve - Tibial   9.316173           FFD700
9                                      Liver  11.389639           AABB66
10                        Esophagus - Mucosa  12.715260           552200
11                       Cervix - Endocervix  21.005066           CCAADD
...
Next, add # sign for color codes.

> DATA$tissue_color_hex = paste("#", DATA$tissue_color_hex, sep="")
> DATA
                                 GTEx_tissue       RPKM tissue_color_hex
1            Brain - Caudate (basal ganglia)   1.032300          #EEEE00
2                                Whole Blood   1.249288          #FF00BB
3                            Colon - Sigmoid   1.515215          #EEBB77
4                             Artery - Aorta   2.133841          #FF5555
5            Brain - Putamen (basal ganglia)   5.461759          #EEEE00
6                   Heart - Atrial Appendage   6.369740          #9900FF
7                                  Pituitary   6.862820          #AAFF99
8                             Nerve - Tibial   9.316173          #FFD700
9                                      Liver  11.389639          #AABB66
10                        Esophagus - Mucosa  12.715260          #552200
11                       Cervix - Endocervix  21.005066          #CCAADD
...
It is important to reorder levels not to be alphabetically sorted as this is how they will appear on the graph. 

> factor (DATA$GTEx_tissue)
 [1] Brain - Caudate (basal ganglia)          
 [2] Whole Blood                              
 [3] Colon - Sigmoid                          
 [4] Artery - Aorta                           
 [5] Brain - Putamen (basal ganglia)          
 [6] Heart - Atrial Appendage                 
 [7] Pituitary                                
 [8] Nerve - Tibial                           
 [9] Liver                                    
[10] Esophagus - Mucosa                       
...
[53] Brain - Nucleus accumbens (basal ganglia)
[54] Esophagus - Gastroesophageal Junction    
54 Levels: Adipose - Subcutaneous ... Whole Blood
First convert the column of the data frame to character vector, then recreate factor levels. Check the levels order.

> DATA$GTEx_tissue<-as.character(DATA$GTEx_tissue)

> DATA$GTEx_tissue<-factor(DATA$GTEx_tissue, levels=unique(DATA$GTEx_tissue))

> factor (DATA$GTEx_tissue)
 [1] Brain - Caudate (basal ganglia)          
 [2] Whole Blood                              
 [3] Colon - Sigmoid                          
 [4] Artery - Aorta                           
 [5] Brain - Putamen (basal ganglia)          
 [6] Heart - Atrial Appendage                 
 [7] Pituitary                                
 [8] Nerve - Tibial                           
 [9] Liver                                    
[10] Esophagus - Mucosa                       
...        
[53] Brain - Nucleus accumbens (basal ganglia)
[54] Esophagus - Gastroesophageal Junction    
54 Levels: Brain - Caudate (basal ganglia) Whole Blood ... Esophagus - Gastroesophageal Junction
Check if DATA$RPKM levels are in the correct order.

> factor (DATA$RPKM)
 [1] 1.03230021638509 1.24928759400422 1.51521544457643 2.13384071515083
 [5] 5.46175870783362 6.36974006723701 6.86281980596968 9.31617267159933
 [9] 11.3896391114977 12.7152598395188 21.0050660365301 23.4022785915606
[13] 23.816931737888  24.0284416972293 26.7461670932036 27.055894746729 
[17] 28.3445527395585 28.472923085106  29.1229822421017 31.6109435670403
[21] 33.1257268898051 33.6736188322218 33.7810087909512 37.907376687972 
[25] 37.9136671742278 38.9443756171557 40.791872968003  48.5668687423655
[29] 48.8203345391151 49.3179500223535 51.5145476186135 52.4176718915877
[33] 53.7403789981182 53.9869116218224 56.9835507558119 58.0722270819484
[37] 63.5714835062821 65.0641288989535 66.859651074282  67.9651249298835
[41] 68.7278064101102 72.1703782792782 77.0099742646478 77.727954691935 
[45] 86.771263595058  87.8177781990037 88.3191874893887 90.0888195183517
[49] 101.738608629986 102.486500884127 103.895843148456 116.200919797288
[53] 125.497316610595 127.021903203954
54 Levels: 1.03230021638509 1.24928759400422 ... 127.021903203954
Similarly reorder factor levels for DATA$tissue_color_hex.

> factor (DATA$tissue_color_hex)
 [1] #EEEE00 #FF00BB #EEBB77 #FF5555 #EEEE00 #9900FF #AAFF99 #FFD700 #AABB66
[10] #552200 #CCAADD #555522 #778855 #FFAA00 #995522 #EEEE00 #FFDD99 #EEEE00
[19] #006600 #7777FF #EEEE00 #AAEEFF #FFAA99 #EEEE00 #660099 #BB9988 #DDDDDD
[28] #FF66FF #EEEE00 #AAAAAA #EEEE00 #AA0000 #22FFDD #0000FF #FFCCCC #99FF00
[37] #CC9955 #FFAAFF #FF6600 #FFAA99 #FFCCCC #CC66FF #EEEE00 #FF5599 #FF0000
[46] #EEEE00 #EEEE00 #33CCCC #EEEE00 #AAAAFF #99BB88 #33DD33 #EEEE00 #8B7355
40 Levels: #0000FF #006600 #22FFDD #33CCCC #33DD33 #552200 #555522 ... #FFDD99
> DATA$tissue_color_hex<-as.character(DATA$tissue_color_hex)
> DATA$tissue_color_hex<-factor(DATA$tissue_color_hex, levels=unique(DATA$tissue_color_hex))
> factor (DATA$tissue_color_hex)
 [1] #EEEE00 #FF00BB #EEBB77 #FF5555 #EEEE00 #9900FF #AAFF99 #FFD700 #AABB66
[10] #552200 #CCAADD #555522 #778855 #FFAA00 #995522 #EEEE00 #FFDD99 #EEEE00
[19] #006600 #7777FF #EEEE00 #AAEEFF #FFAA99 #EEEE00 #660099 #BB9988 #DDDDDD
[28] #FF66FF #EEEE00 #AAAAAA #EEEE00 #AA0000 #22FFDD #0000FF #FFCCCC #99FF00
[37] #CC9955 #FFAAFF #FF6600 #FFAA99 #FFCCCC #CC66FF #EEEE00 #FF5599 #FF0000
[46] #EEEE00 #EEEE00 #33CCCC #EEEE00 #AAAAFF #99BB88 #33DD33 #EEEE00 #8B7355
40 Levels: #EEEE00 #FF00BB #EEBB77 #FF5555 #9900FF #AAFF99 #FFD700 ... #8B7355
Use ggplot2 to plot the graph. Use + aes(colour=as.character(DATA$tissue_color_hex)) + scale_colour_identity() for custom colors. 

> library(ggplot2) 

> pdf("GTEx.pdf")

> p <- ggplot(DATA,aes(x=DATA$GTEx_tissue,y=DATA$RPKM, color=DATA$tissue_color_hex))+geom_point(size=3)

> p + ylab("RPKM") + xlab("GTEx_tissue") + coord_flip() + theme_bw()  + aes(colour=as.character(DATA$tissue_color_hex)) + scale_colour_identity()

> dev.off()
The graph has the expression values of interest and it is color coded using GTEx colors.


No comments:

Post a Comment