Saturday, April 7, 2018

A bit of SED and regular expression magic to clear up your tables


Want to keep Gene ID and Fold change from 2nd and 6th column and remove quotations. Use SED and regular expressions.



mpjanic@zoran:~/test$ cat SLC2A.txt

"6033","SLC2A1-AS1",24.8051071979286,27.7198330991446,21.8903812967126,0.789701049729193,-0.340621486785587,0.565579800713107,1
"15656","SLC2A1",21989.7607363,25370.4928449324,18609.0286276675,0.733491018144907,-0.447148795187931,0.0135793982312018,0.274890758345199
"15657","SLC2A10",116.660701272146,109.114399777756,124.207002766537,1.13831907630452,0.186905008681256,0.536100348167181,1
"15658","SLC2A11",153.584023702827,160.671774940703,146.496272464952,0.911773536571793,-0.133252558036828,0.610646274133104,1
"15659","SLC2A12",0,0,0,NA,NA,NA,NA
"15660","SLC2A13",91.7915789084436,88.0137320553238,95.5694257615633,1.08584675970211,0.118820516938777,0.718169926714343,1
"15661","SLC2A14",26.1757135597822,34.6019488604179,17.7494782591466,0.512961808328973,-0.963076678383474,0.0836580834259747,0.696210238779407
"15662","SLC2A2",0.946102725140454,0.450180696019295,1.44202475426161,3.20321321418857,1.67951983083656,0.80556985989175,1
"15663","SLC2A3",1801.59770325241,1801.22866521645,1801.96674128837,1.00040976256161,0.000591041330547107,0.91557362726645,1
"15664","SLC2A4",64.9152045965179,47.8981046043178,81.932304588718,1.71055421222935,0.774463827856209,0.0320174623595299,0.440346477750378
"15665","SLC2A4RG",2839.51257874418,2552.43962703032,3126.58553045805,1.22494005239048,0.292711146586849,0.106250785209328,0.755271734116694
"15666","SLC2A5",0.488574210205611,0.450180696019295,0.526967724391927,1.17056934926712,0.227210408076608,1,1
"15667","SLC2A6",849.67804667658,741.433588192089,957.922505161072,1.29198692966806,0.369591475141396,0.0690179423560097,0.640745754210812
"15668","SLC2A7",0,0,0,NA,NA,NA,NA
"15669","SLC2A8",306.652668694167,317.104693305773,296.20064408256,0.934078398508418,-0.0983844524549534,0.643201071003747,1
"15670","SLC2A9",312.030609161521,297.26095251567,326.800265807372,1.09937165659235,0.136679190181628,0.580291149287337,1


mpjanic@zoran:~/test$ sed -E "s/\"[0-9]*\",\"//g" SLC2A.txt | sed -E "s/\",[0-9.]*,[0-9.]*,[0-9.]*,/ /g" | sed -E "s/,.*//g"
SLC2A1-AS1 0.789701049729193
SLC2A1 0.733491018144907
SLC2A10 1.13831907630452
SLC2A11 0.911773536571793
SLC2A12 NA
SLC2A13 1.08584675970211
SLC2A14 0.512961808328973
SLC2A2 3.20321321418857
SLC2A3 1.00040976256161
SLC2A4 1.71055421222935
SLC2A4RG 1.22494005239048
SLC2A5 1.17056934926712
SLC2A6 1.29198692966806
SLC2A7 NA
SLC2A8 0.934078398508418
SLC2A9 1.09937165659235

Tuesday, March 6, 2018

Code for parsing vcf files

Code to clear out vcf files to get only genotypes calls.

For example, in this vcf file we need to keep only first parameter indicating phased genotypes.

cat rs
0|0:0:1,0,0 0|1:1:0,1,0 0|0:0:1,0,0 1|0:1:0,1,0 0|1:1:0,1,0 0|0:0:1,0,0 1|0:1:0,1,0 0|1:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 1|1:2:0,0,1 1|1:2:0,0,1 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 1|1:2:0,0,1 0|0:0:1,0,0 0|0:0:1,0,0 1|0:1:0,1,0 0|0:0:1,0,0 1|0:1:0,1,0 0|1:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 1|0:1:0,1,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 0|1:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 1|1:2:0,0,1 1|0:1:0,1,0 0|1:1:0,1,0 1|0:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0
0|0:0:1,0,0 0|1:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|1:1:0,1,0 0|0:0:1,0,0 1|0:1:0,1,0 0|1:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 1|1:2:0,0,1 1|1:2:0,0,1 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 1|1:2:0,0,1 0|0:0:1,0,0 0|0:0:1,0,0 1|0:1:0,1,0 0|0:0:1,0,0 1|0:1:0,1,0 0|1:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 1|0:1:0,1,0 1|0:1:0,1,0 0|1:1:0,1,0 1|0:1:0,1,0 0|1:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 0|1:1:0,1,0 1|0:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0
0|0:0:1,0,0 0|1:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 1|0:1:0,1,0 0|1:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 1|1:2:0,0,1 1|1:2:0,0,1 0|0:0:1,0,0 0|0:0:1,0,0 1|1:2:0,0,1 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 1|0:1:0,1,0 0|1:1:0,1,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 1|1:2:0,0,1 0|1:1:0,1,0 0|0:0:1,0,0 1|0:1:0,1,0 0|0:0:1,0,0 1|0:1:0,1,0 0|1:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 1|0:1:0,1,0 1|0:1:0,1,0 0|1:1:0,1,0 1|0:1:0,1,0 0|1:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 0|0:0:1,0,0 0|1:1:0,1,0 1|0:1:0,1,0 0|1:1:0,1,0 1|0:1:0,1,0 1|0:1:0,1,0 1|0:1:0,1,0
1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|0:1:0,1,0 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 0|1:1:0,1,0 1|1:2:0,0,1 1|1:2:0,0,1 0|1:1:0,1,0 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|0:1:0,1,0 1|1:2:0,0,1 1|1:2:0,0,1 0|1:1:0,1,0 1|1:2:0,0,1 1|1:2:0,0,1 1|1:1.99:0,0.01,0.99 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|0:1:0,1,0 1|1:2:0,0,1 1|0:1:0,1,0 0|1:1:0,1,0 1|1:2:0,0,1 0|1:1:0,1,0 1|1:2:0,0,1 1|1:2:0,0,1 0|1:1:0,1,0 0|1:1:0,1,0 1|1:2:0,0,1 1|1:2:0,0,1 1|0:1.4:0,0.6,0.4 1|1:2:0,0,1 0|1:1:0,1,0 1|0:1:0,1,0 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|1:2:0,0,1 1|0:1:0,1,0

Using sed, we could remove : and a character class [0-9] with any number of repetitions *. However, this did not clear out decimal numbers present in some columns.

sed -E 's/:[0-9]*:[0-9]*,[0-9]*,[0-9]*//g' rs
0|0 0|1 0|0 1|0 0|1 0|0 1|0 0|1 0|0 0|0 0|0 1|1 1|1 0|0 0|0 0|0 0|0 0|1 1|0 1|0 0|0 0|0 1|0 0|0 0|0 0|1 1|0 1|1 0|0 0|0 1|0 0|0 1|0 0|1 1|0 0|0 1|0 0|0 0|0 1|0 0|0 0|1 1|0 0|1 0|0 0|0 0|1 1|0 0|0 0|0 0|0 0|0 1|1 1|0 0|1 1|0 1|0 0|0
0|0 0|1 0|0 0|0 0|1 0|0 1|0 0|1 0|0 0|0 0|0 1|1 1|1 0|0 0|0 0|0 0|0 0|1 1|0 1|0 0|0 0|0 1|0 0|0 0|0 0|1 1|0 1|1 0|0 0|0 1|0 0|0 1|0 0|1 1|0 0|0 1|0 0|0 0|0 1|0 1|0 0|1 1|0 0|1 0|0 0|0 0|1 1|0 0|0 0|0 0|0 0|0 0|1 1|0 0|1 1|0 1|0 0|0
0|0 0|1 0|0 0|0 0|0 0|0 1|0 0|1 0|0 0|0 0|0 1|1 1|1 0|0 0|0 1|1 0|0 0|1 1|0 1|0 0|0 0|0 1|0 0|1 0|0 0|1 1|0 1|1 0|1 0|0 1|0 0|0 1|0 0|1 1|0 0|0 1|0 0|0 0|0 1|0 1|0 0|1 1|0 0|1 0|0 0|0 0|1 1|0 0|0 0|0 0|0 0|0 0|1 1|0 0|1 1|0 1|0 1|0
1|1 1|1 1|1 1|1 1|1 1|0 1|1 1|1 1|1 1|1 1|1 1|1 1|1 0|1 1|1 1|1 0|1 1|1 1|1 1|1 1|1 1|1 1|0 1|1 1|1 0|1 1|1 1|1 1|1:1.99:0,0.01,0.99 1|1 1|1 1|1 1|0 1|1 1|0 0|1 1|1 0|1 1|1 1|1 0|1 0|1 1|1 1|1 1|0:1.4:0,0.6,0.4 1|1 0|1 1|0 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|0

To remove decimal points, use character class [0-9.] with any number of repetitions *.

sed -E 's/:[0-9.]*:[0-9.]*,[0-9.]*,[0-9.]*//g' rs
0|0 0|1 0|0 1|0 0|1 0|0 1|0 0|1 0|0 0|0 0|0 1|1 1|1 0|0 0|0 0|0 0|0 0|1 1|0 1|0 0|0 0|0 1|0 0|0 0|0 0|1 1|0 1|1 0|0 0|0 1|0 0|0 1|0 0|1 1|0 0|0 1|0 0|0 0|0 1|0 0|0 0|1 1|0 0|1 0|0 0|0 0|1 1|0 0|0 0|0 0|0 0|0 1|1 1|0 0|1 1|0 1|0 0|0
0|0 0|1 0|0 0|0 0|1 0|0 1|0 0|1 0|0 0|0 0|0 1|1 1|1 0|0 0|0 0|0 0|0 0|1 1|0 1|0 0|0 0|0 1|0 0|0 0|0 0|1 1|0 1|1 0|0 0|0 1|0 0|0 1|0 0|1 1|0 0|0 1|0 0|0 0|0 1|0 1|0 0|1 1|0 0|1 0|0 0|0 0|1 1|0 0|0 0|0 0|0 0|0 0|1 1|0 0|1 1|0 1|0 0|0
0|0 0|1 0|0 0|0 0|0 0|0 1|0 0|1 0|0 0|0 0|0 1|1 1|1 0|0 0|0 1|1 0|0 0|1 1|0 1|0 0|0 0|0 1|0 0|1 0|0 0|1 1|0 1|1 0|1 0|0 1|0 0|0 1|0 0|1 1|0 0|0 1|0 0|0 0|0 1|0 1|0 0|1 1|0 0|1 0|0 0|0 0|1 1|0 0|0 0|0 0|0 0|0 0|1 1|0 0|1 1|0 1|0 1|0
1|1 1|1 1|1 1|1 1|1 1|0 1|1 1|1 1|1 1|1 1|1 1|1 1|1 0|1 1|1 1|1 0|1 1|1 1|1 1|1 1|1 1|1 1|0 1|1 1|1 0|1 1|1 1|1 1|1 1|1 1|1 1|1 1|0 1|1 1|0 0|1 1|1 0|1 1|1 1|1 0|1 0|1 1|1 1|1 1|0 1|1 0|1 1|0 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|0


Monday, January 29, 2018

Calculate per gene average expression from mastertable using awk/bash scripting

If you want to calculate average expression per each gene across all conditions in a mastertable use awk:

mpjanic@zoran:~$ head mastertable -n25
        TQ6     TQ7     TQ8     TQ9     TQ10    TQ11
lnc-CCDC77-4:1  0       0       0       0       0       0
lnc-COX10-9:1   0       0       0       0       0       0
lnc-MAGEB2-1:1  0       0       0       0       0       0
lnc-TMEM99-2:1  0       0       0       0       0       0
lnc-COX10-9:2   0       0       0       0       0       0
DDN-AS1:2       0       0       0       0       0       0
lnc-TMEM99-2:2  1       0       0       0       0       0
lnc-SPRY4-3:1   0       0       0       0       0       0
DDN-AS1:3       0       0       0       0       0       0
lnc-TMEM99-2:3  0       0       0       0       0       0
DDN-AS1:4       28      32      10      31      2       13
DDN-AS1:5       0       0       0       0       0       0
lnc-ZNF516-4:10 0       0       0       0       0       0
DDN-AS1:6       0       0       0       0       0       0
lnc-ZNF516-4:11 0       0       0       0       0       0
GSEC:2  15      9       16      10      9       34
lnc-AATK-AS1-2:1        15      17      2       19      28      24
lnc-PLCH1-5:1   0       0       0       0       0       0
GSEC:3  0       0       0       0       0       0
lnc-MFSD9-7:1   53      27      22      41      47      41
lnc-PSMC1-1:1   29      30      54      18      23      58
GSEC:4  0       0       0       0       0       0
GSEC:5  0       0       0       0       0       0
lnc-ZNF780B-1:1 124     125     80      123     110     148
Use awk to print the first line, then for each field starting from NF>2 assuming the first field is the gene name, perform {sum=0; for (i=2; i<=NF; i++) sum+=$i; print $1, sum/(NF-1)}.

Then, sort -gr -k2, to sort in reverse order and with -g option (--general-numeric-sort):


mpjanic@zoran:~$ awk 'NR == 1 { print "lncRNA", "Average"; next }    # Print a heading row\
> NF > 2 { sum=0; for (i=2; i<=NF; i++) sum+=$i; print $1, sum/(NF-1) }' mastertable | sort -gr -k2| head -n 20
lnc-SGCE-3:1 157963
lnc-EIF2AK4-6:1 120530
lnc-SLC3A2-6:1 110467
lnc-ATIC-14:1 66120.8
lnc-TRIM69-3:1 59894.5
lnc-TRDMT1-5:2 52209.8
lnc-ANKRD55-6:1 44934.3
lnc-LRRTM4-6:1 44869.3
lnc-LYN-8:1 39859.2
lnc-VGF-4:1 37230.2
lnc-VGF-3:1 32908.2
lnc-VAT1-4:1 27177.7
lnc-SH3D19-2:1 22266.8
IGFBP7-AS1:16 21963.5
lnc-HSD17B7-1:2 21429.5
lnc-BTD-2:1 21348.2
lnc-ARID2-11:1 21063.7
lnc-CBY3-3:2 19925.3
lnc-DYNC2H1-4:1 19492.8
lnc-C6orf120-1:7 18986.2
Then save it in a file expression_average_per_gene:

mpjanic@zoran:~$ awk 'NR == 1 { print "lncRNA", "Average"; next }    # Print a heading row\
NF > 2 { sum=0; for (i=2; i<=NF; i++) sum+=$i; print $1, sum/(NF-1) }' mastertable | sort -gr -k2| head -n 20

Friday, January 12, 2018

Code to plot GTEx color coded expression levels

These are the instructions to plot GTEx color coded expression levels in R and ggplot2. You need to have two files, one with GTEx tissue names and expression values,

> RPKM<-read.table("RPKM.txt", head=TRUE,stringsAsFactors=FALSE,sep="\t")

> RPKM
                                 GTEx_tissue       RPKM
1                     Adipose - Subcutaneous  66.859651
2               Adipose - Visceral (Omentum)  24.028442
3                              Adrenal Gland 116.200920
4                             Artery - Aorta   2.133841
5                          Artery - Coronary  67.965125
6                            Artery - Tibial  86.771264
7                                    Bladder  52.417672
8                           Brain - Amygdala  48.820335
9   Brain - Anterior cingulate cortex (BA24)  51.514548
10           Brain - Caudate (basal ganglia)   1.032300
...
And a file with GTEx tissues and color code.

> COLOR<-read.table("COLOR.txt", head=TRUE,stringsAsFactors=FALSE,sep="\t")
> COLOR
                          tissue_site_detail tissue_color_hex
1                     Adipose - Subcutaneous           FF6600
2               Adipose - Visceral (Omentum)           FFAA00
3                              Adrenal Gland           33DD33
4                             Artery - Aorta           FF5555
5                          Artery - Coronary           FFAA99
6                            Artery - Tibial           FF0000
7                                    Bladder           AA0000
8                           Brain - Amygdala           EEEE00
9   Brain - Anterior cingulate cortex (BA24)           EEEE00
10           Brain - Caudate (basal ganglia)           EEEE00
...
First lets order the RPKM data frame.

> RPKM<-RPKM[order(RPKM$RPKM),]
> RPKM
                                 GTEx_tissue       RPKM
10           Brain - Caudate (basal ganglia)   1.032300
53                               Whole Blood   1.249288
26                           Colon - Sigmoid   1.515215
4                             Artery - Aorta   2.133841
18           Brain - Putamen (basal ganglia)   5.461759
32                  Heart - Atrial Appendage   6.369740
42                                 Pituitary   6.862820
39                            Nerve - Tibial   9.316173
35                                     Liver  11.389639
29                        Esophagus - Mucosa  12.715260
25                       Cervix - Endocervix  21.005066
...
Next, merge RPKM and COLOR data frames while preserving the original order.

> DATA<-merge(RPKM, COLOR, by.x="GTEx_tissue", by.y="tissue_site_detail", sort=F)
> DATA
                                 GTEx_tissue       RPKM tissue_color_hex
1            Brain - Caudate (basal ganglia)   1.032300           EEEE00
2                                Whole Blood   1.249288           FF00BB
3                            Colon - Sigmoid   1.515215           EEBB77
4                             Artery - Aorta   2.133841           FF5555
5            Brain - Putamen (basal ganglia)   5.461759           EEEE00
6                   Heart - Atrial Appendage   6.369740           9900FF
7                                  Pituitary   6.862820           AAFF99
8                             Nerve - Tibial   9.316173           FFD700
9                                      Liver  11.389639           AABB66
10                        Esophagus - Mucosa  12.715260           552200
11                       Cervix - Endocervix  21.005066           CCAADD
...
Next, add # sign for color codes.

> DATA$tissue_color_hex = paste("#", DATA$tissue_color_hex, sep="")
> DATA
                                 GTEx_tissue       RPKM tissue_color_hex
1            Brain - Caudate (basal ganglia)   1.032300          #EEEE00
2                                Whole Blood   1.249288          #FF00BB
3                            Colon - Sigmoid   1.515215          #EEBB77
4                             Artery - Aorta   2.133841          #FF5555
5            Brain - Putamen (basal ganglia)   5.461759          #EEEE00
6                   Heart - Atrial Appendage   6.369740          #9900FF
7                                  Pituitary   6.862820          #AAFF99
8                             Nerve - Tibial   9.316173          #FFD700
9                                      Liver  11.389639          #AABB66
10                        Esophagus - Mucosa  12.715260          #552200
11                       Cervix - Endocervix  21.005066          #CCAADD
...
It is important to reorder levels not to be alphabetically sorted as this is how they will appear on the graph. 

> factor (DATA$GTEx_tissue)
 [1] Brain - Caudate (basal ganglia)          
 [2] Whole Blood                              
 [3] Colon - Sigmoid                          
 [4] Artery - Aorta                           
 [5] Brain - Putamen (basal ganglia)          
 [6] Heart - Atrial Appendage                 
 [7] Pituitary                                
 [8] Nerve - Tibial                           
 [9] Liver                                    
[10] Esophagus - Mucosa                       
...
[53] Brain - Nucleus accumbens (basal ganglia)
[54] Esophagus - Gastroesophageal Junction    
54 Levels: Adipose - Subcutaneous ... Whole Blood
First convert the column of the data frame to character vector, then recreate factor levels. Check the levels order.

> DATA$GTEx_tissue<-as.character(DATA$GTEx_tissue)

> DATA$GTEx_tissue<-factor(DATA$GTEx_tissue, levels=unique(DATA$GTEx_tissue))

> factor (DATA$GTEx_tissue)
 [1] Brain - Caudate (basal ganglia)          
 [2] Whole Blood                              
 [3] Colon - Sigmoid                          
 [4] Artery - Aorta                           
 [5] Brain - Putamen (basal ganglia)          
 [6] Heart - Atrial Appendage                 
 [7] Pituitary                                
 [8] Nerve - Tibial                           
 [9] Liver                                    
[10] Esophagus - Mucosa                       
...        
[53] Brain - Nucleus accumbens (basal ganglia)
[54] Esophagus - Gastroesophageal Junction    
54 Levels: Brain - Caudate (basal ganglia) Whole Blood ... Esophagus - Gastroesophageal Junction
Check if DATA$RPKM levels are in the correct order.

> factor (DATA$RPKM)
 [1] 1.03230021638509 1.24928759400422 1.51521544457643 2.13384071515083
 [5] 5.46175870783362 6.36974006723701 6.86281980596968 9.31617267159933
 [9] 11.3896391114977 12.7152598395188 21.0050660365301 23.4022785915606
[13] 23.816931737888  24.0284416972293 26.7461670932036 27.055894746729 
[17] 28.3445527395585 28.472923085106  29.1229822421017 31.6109435670403
[21] 33.1257268898051 33.6736188322218 33.7810087909512 37.907376687972 
[25] 37.9136671742278 38.9443756171557 40.791872968003  48.5668687423655
[29] 48.8203345391151 49.3179500223535 51.5145476186135 52.4176718915877
[33] 53.7403789981182 53.9869116218224 56.9835507558119 58.0722270819484
[37] 63.5714835062821 65.0641288989535 66.859651074282  67.9651249298835
[41] 68.7278064101102 72.1703782792782 77.0099742646478 77.727954691935 
[45] 86.771263595058  87.8177781990037 88.3191874893887 90.0888195183517
[49] 101.738608629986 102.486500884127 103.895843148456 116.200919797288
[53] 125.497316610595 127.021903203954
54 Levels: 1.03230021638509 1.24928759400422 ... 127.021903203954
Similarly reorder factor levels for DATA$tissue_color_hex.

> factor (DATA$tissue_color_hex)
 [1] #EEEE00 #FF00BB #EEBB77 #FF5555 #EEEE00 #9900FF #AAFF99 #FFD700 #AABB66
[10] #552200 #CCAADD #555522 #778855 #FFAA00 #995522 #EEEE00 #FFDD99 #EEEE00
[19] #006600 #7777FF #EEEE00 #AAEEFF #FFAA99 #EEEE00 #660099 #BB9988 #DDDDDD
[28] #FF66FF #EEEE00 #AAAAAA #EEEE00 #AA0000 #22FFDD #0000FF #FFCCCC #99FF00
[37] #CC9955 #FFAAFF #FF6600 #FFAA99 #FFCCCC #CC66FF #EEEE00 #FF5599 #FF0000
[46] #EEEE00 #EEEE00 #33CCCC #EEEE00 #AAAAFF #99BB88 #33DD33 #EEEE00 #8B7355
40 Levels: #0000FF #006600 #22FFDD #33CCCC #33DD33 #552200 #555522 ... #FFDD99
> DATA$tissue_color_hex<-as.character(DATA$tissue_color_hex)
> DATA$tissue_color_hex<-factor(DATA$tissue_color_hex, levels=unique(DATA$tissue_color_hex))
> factor (DATA$tissue_color_hex)
 [1] #EEEE00 #FF00BB #EEBB77 #FF5555 #EEEE00 #9900FF #AAFF99 #FFD700 #AABB66
[10] #552200 #CCAADD #555522 #778855 #FFAA00 #995522 #EEEE00 #FFDD99 #EEEE00
[19] #006600 #7777FF #EEEE00 #AAEEFF #FFAA99 #EEEE00 #660099 #BB9988 #DDDDDD
[28] #FF66FF #EEEE00 #AAAAAA #EEEE00 #AA0000 #22FFDD #0000FF #FFCCCC #99FF00
[37] #CC9955 #FFAAFF #FF6600 #FFAA99 #FFCCCC #CC66FF #EEEE00 #FF5599 #FF0000
[46] #EEEE00 #EEEE00 #33CCCC #EEEE00 #AAAAFF #99BB88 #33DD33 #EEEE00 #8B7355
40 Levels: #EEEE00 #FF00BB #EEBB77 #FF5555 #9900FF #AAFF99 #FFD700 ... #8B7355
Use ggplot2 to plot the graph. Use + aes(colour=as.character(DATA$tissue_color_hex)) + scale_colour_identity() for custom colors. 

> library(ggplot2) 

> pdf("GTEx.pdf")

> p <- ggplot(DATA,aes(x=DATA$GTEx_tissue,y=DATA$RPKM, color=DATA$tissue_color_hex))+geom_point(size=3)

> p + ylab("RPKM") + xlab("GTEx_tissue") + coord_flip() + theme_bw()  + aes(colour=as.character(DATA$tissue_color_hex)) + scale_colour_identity()

> dev.off()
The graph has the expression values of interest and it is color coded using GTEx colors.