Tuesday, August 29, 2017

GTExExtractor, a script to parse and plot individual level GTEx data

This is the script I wrote you may find useful in case you are working with GTEx data.

https://github.com/milospjanic/GTExExtractor/

As GTEx currently can only show the box plot distribution for a single gene across tissues, I have made a script (named it GTExExtractor) to show the distribution of multiple genes in a single tissue in a form of more informative violin plot. You just have to type the genes you want and the tissue. You can use it to show how certain members within a gene family are expressed higher in certain tissues.

For example, NFIX is more dominant in the class of NFI transcription factors in coronary arteries, while in the skeletal muscle it is NFIC, and in lung NFIB becomes more expressed.






You can see other examples in the link.

Thursday, August 10, 2017

Extract columns by matching column names from file

If you want to select columns using column names that are read from another file use this code. For example, input file is input.txt and column names to be extracted are in columns.txt.

First save this awk code as extractor.sh, it will read the first row and iterate through the fields to find those that match the first argument $1. After it will save the matched row number as col_num and print it out. For other rows, NR>1, it will print out the field content only if i=col_num.

awk -v COLUMN_ID=$1 '
        NR==1 {
                for (i=1; i<=NF; i++) {
                        if ($i==COLUMN_ID) {
                                col_num=i;
                                print $i;
                        }
                }
        }
        NR>1 {
                if (i=col_num) {
                        print $i;
                }
        }
' $2
Then run this code, it will read the columns.txt file line by line, save it in each loop into variable $line, then it runs the extractor.sh script with $line as $1 and input.txt as $2. It then saves output for each $line as a temp file. Then use paste to connect all .temp files to create the final table.

while IFS= read -r line; 
do ./extractor.sh $line path/to/file/input.txt > $line.temp; 

done < path/to/file/columns.txt 

paste *.temp > output.txt