Friday, April 8, 2011

Application of the random sampling in genomics

Random sampling of the initial data set provides valuable estimates about the genome-wide correlation analysis and helps define true causal correlations and reveal false positive ones. Here is an example of a false positive correlation between transcription factor and a chromatin modification. The peak of chromatin modification shares same height for TF occupied genes as for the random gene subsets of the same size.

Conversely, if the peak is above the randomized ones the correlation peak possesses a causal characteristics for the two features tested. In the case bellow tested chromatin modification associates with the transcription binding near the promoters of genes.

This kind of analysis is necessary to determine whether a global correlation on a genome-wide scale is meaningful or not, i.e. whether it implies causal relationship between tested parameters or not. Researchers may be tricked by the correlation peaks when doing whole-genome analysis and thus may make wrong conclusions unless this kind of approach is not applied in parallel. The detailed information and the source code for the random sampling could be download from the BMC Genomics web page.

Link to original article:

1 comment:

  1. This has increased my understanding about genomes and genome-wide analysis. Kindly share more meaningful posts like these. I am so looking forward for another post by you.