Sunday, October 30, 2016

Hierarchical clustering of correlation matrices to find co-regulated genes

If you obtained a correlation matrix for a set of genes and would like to cluster it, use the following R code.

In this example, dissimilarity is calculated using 1 - absolute value of the correlation formula that gives the largest discrimination of the correlated pairs compared to other methods, such as, 1-abs(cor^2), 1-cor, or (1-cor)/2. Ref. http://research.stowers-institute.org/mcm/efg/R/Visualization/cor-cluster/index.htm

library("spatstat")
library(gplots)

#load data matrix
data <-read.delim("data.txt", header=T,row.names=1)
data[is.na(data)] <- 1

#list matrix
data

#use cor to make a correlation matrix
correlation<-cor(data)

#make disimilarity matrix as 1 - absolute value of the correlation
dissimilarity <- 1 - abs(correlation)

#calculate distance
distance <- as.dist(dissimilarity)

#create pdf
pdf("cor.matrix.clustered.pdf")

#plot matrix using heatmap.2 from gplots
heatmap.2(as.matrix(distance))

dev.off()
After plotting the resulting graph will show clusters of correlated genes using hierarchical clustering of heatmap.2, that itself uses hclust R function for clustering:




No comments:

Post a Comment