Friday, September 22, 2017

Two scripts for clustering and coloring of single cell RNAseq data

These are the two scripts that I wrote that will cluster single cell RNAseq data using principal component analysis and color the individual cells with the expression of one or two genes in a gradient scale.

The first script is SingleCellPCAplot, an R script that performs principal component analysis of single cell RNAseq data starting from an RPKM mastertable and outputs a PCA plot with cells colored according to the expression of a gene of interest in a color gradient scale. It is useful if you want quickly to cluster the cells and check the expression of your genes in different clusters. When running the script just input the gene of interest and you will get the pdf output.

The second script is SingleCellPCAplotMultiGene, an extension of the previous script that may be better for the visual cluster separation. It will use a ratio of expression of two genes for the two-color gradient. In the examples, I show how you can achieve cluster separation within the same cell type on the example of fibroblasts using two different fibroblast markers. 
When colored with fibroblast/cardiomyocyte markers the cells show clear fibroblast lineage.

However when colored with two fibroblast markers cells cluster into two separate subclusters of fibroblasts.