Wednesday, February 24, 2016

Principal component analysis using ggplot2 and wesanderson color palette in R

Lets show how to generate a principle component analysis (PCA) plot in R and make it more appealing.

This is the head of the data.frame

Hcasmc  Hcasmc-pdgfdd   Hcasmc-pdgfbb   Hcasmc-sf       Hcasmc-tgfb1    Hcasmc-pdgfdd   Hcasmc-pdgfbb   Hcasmc-sf       Hcasmc-tgfb1    Athero  Normal  Normal2 Normal3 Athero3
83.7839 49.2443 52.817  58.7663 68.057  44.9314 47.9035 66.4877 62.4438 150.564 173.965 86.8707 121.371 228.689
83.7839 49.2443 52.817  58.7663 68.057  44.9314 47.9035 66.4877 62.4438 150.564 173.965 86.8707 121.371 228.689
83.7839 49.2443 52.817  58.7663 68.057  44.9314 47.9035 66.4877 62.4438 0       173.965 86.8707 121.371 228.689
0       8.37066 13.914  6.40291 11.3867 9.96751 11.6739 11.559  10.0152 0       0       86.8707 121.371 0
0       30.2485 0       0       55.8487 48.8618 0       0       49.8919 0       0       0       0       0
54.9774 30.2485 38.5183 47.3038 55.8487 48.8618 42.1996 68.139  49.8919 0       0       0       0       0
54.9774 30.2485 38.5183 47.3038 55.8487 48.8618 42.1996 68.139  49.8919 0       34.9118 33.9246 33.4813 0
54.9774 30.2485 38.5183 47.3038 55.8487 48.8618 42.1996 68.139  49.8919 0       0       0       0       0
21.3106 51.5006 48.4945 41.1112 49.1787 39.7445 41.0823 31.3953 29.9609 0       0       0       0       0
In R load the data frame with read.delim, transpose it with t and use the prcomp function:

test <- read.delim("unionbedg_with_hcasmc_serum_pdgf_tgf_FINAL_nochrXY_over100_cut_100-2000_no_encode_no0_noatherobadsample_no0",header=T)

test.tr <- t(test)
pca <- prcomp(test.tr, scale=T)

pca.labels <- colnames(test)

plot(pca$x[,2], pca$x[,3],xlab="PCA2", ylab="PCA3",main="PCA for components 2&3", type="p", cex=2, pch=21, col=18, bg=13)
text(pca$x[,2], pca$x[,3],labels=pca.labels, cex= 0.8, pos=3)
This plot is generic and may not be appealing, however if you want to plot it with ggplot and with the wesanderson color palettes use:

library(ggplot2)
library (wesanderson)

PCA<- data.frame(pca$x[,2], pca$x[,3])
colnames(PCA)<-c("PC2","PC3")
PCA$CONDITION<-c("HCASMC SERUM", "HCASMC PDGFDD", "HCASMC PDGFBB", "HCASMC SERUM FREE", "HCASMC TGFB1", "HCASMC PDGFDD", "HCASMC PDGFBB", "HCASMC SERUM FREE", "HCASMC TGFB1", "ATHERO CORONARY", "NORMAL CORONARY", "NORMAL CORONARY", "NORMAL CORONARY","ATHERO CORONARY")

d<-ggplot(PCA, aes(x=PC2, y=PC3, color=CONDITION)) +geom_point(size=6)+scale_color_manual(values = c(wes_palette("Cavalcanti"),wes_palette("GrandBudapest"))) + theme_gray()

pdf("tissue_wesanderson.pdf", width=10, height=6)
d
dev.off()

No comments:

Post a Comment