Final script I wrote in a series - genomic overlaps - performs binomial statistical test using two bed files with genomic coordinates. This is a combined bash/R script that will generate a binomial p-value that shows significance for the overlap of two sets of genomic regions in

To calculate binomial p-value you would need two bed files as inputs. Note that binomial test does not require background set as do Fisher and hypergeometric tests.

Note that file names are optional.

Example output:

Code:

**hg19 human genome**(for example from ChIP-Seq experiments).To calculate binomial p-value you would need two bed files as inputs. Note that binomial test does not require background set as do Fisher and hypergeometric tests.

Note that file names are optional.

Example output:

mpjanic@zoran:~/AHR-TCF21$ ./binomial.sh ARNT.chipseq.cut TCF21_sigma_not_normalized_peaks.bed background.bed Number of unique overlaps of A and B 143 Calculating coverage of ./TCF21_sigma_not_normalized_peaks.bed in human genome hg19 Base pair coverage of B in hg19 is: 0.00566281 Calculating binomial p-value with pbinom(143, 42920, 0.00566281) Binomial p-value: [1] 2.196093e-12 num 2.2e-12

#!/bin/bash # binomial-test-for-genomic-overlaps1 #intersecting with bedtools bedtools intersect -a $1 -b $2 | uniq > A_B_uniq.bed echo "Number of unique overlaps of A and B" wc -l A_B_uniq.bed | cut -f1 -d ' ' mkdir TEMP for f in $(find . -maxdepth 1 -name $2) do echo "Calculating coverage of $f in human genome hg19" cat "$f" | awk ' { FS="\t" { print $1"\t"$2"\t"$3"\t"($3-$2) } }' > TEMP/"$f".temp done for f in $(find TEMP/ -maxdepth 1 -name \*temp\*); do echo "$f" cat "$f" | awk ' {s+=$4}END{print s}'; done > out.dat awk '{if (count++%2!=0) print ($0/2630301437);print $0}' out.dat > out.dat2 declare bpcoverage=$(awk '{if (count++%2!=0) print ($0/2630301437)}' out.dat) echo "Base pair coverage of B in hg19 is: $bpcoverage" sed -i 's/TEMP\//print\(\"/' out.dat2 sed -i 's/\.temp/\"\)/' out.dat2 awk '{if (count++%3!=2) print$0}' out.dat2 > out.dat3 awk '{if (count++%2!=0) print ("dbinom("$0")");print $0}' out.dat3 > out.dat4 #writing and executing R script echo "#!/usr/bin/Rscript A_B.binomial <- pbinom($(wc -l A_B_uniq.bed | cut -f1 -d ' '), $(wc -l $2 | cut -f1 -d ' '), $bpcoverage) A_B.binomial str(A_B.binomial)" > script.r chmod 775 script.r echo "Calculating binomial p-value with pbinom($(wc -l A_B_uniq.bed | cut -f1 -d ' '), $(wc -l $2 | cut -f1 -d ' '), $bpcoverage)" echo "Binomial p-value:" ./script.r #removing intermediary files rm A_B_uniq.bed rm script.r rm -r TEMP

## No comments:

## Post a Comment