Monday, April 18, 2016

Using xargs to parallelize bedrolls intersect


One of the most useful commands in bash is xargs, that allows you to substitute simple loops with a one liner command. Xargs takes output of the e.g. find command and sequentially runs the issued command/s to all files that match the find attributes. In this example I have parallelized bedrolls intersect with xargs to intersect 125 ENCODE datasets with our test dataset.

find ENCODE_DNASE_FOR_PCA_PLOT/*narrowPeak.cut.merge | xargs -I % sh -c 'echo %; bedtools intersect -wa -wb -a snp.position.rs1537373 -b %;'
ENCODE_DNASE_FOR_PCA_PLOT/wgEncodeAwgDnaseDuke8988tUniPk.narrowPeak.cut.merge
ENCODE_DNASE_FOR_PCA_PLOT/wgEncodeAwgDnaseDukeAosmcUniPk.narrowPeak.cut.merge
ENCODE_DNASE_FOR_PCA_PLOT/wgEncodeAwgDnaseDukeChorionUniPk.narrowPeak.cut.merge
ENCODE_DNASE_FOR_PCA_PLOT/wgEncodeAwgDnaseDukeCllUniPk.narrowPeak.cut.merge
ENCODE_DNASE_FOR_PCA_PLOT/wgEncodeAwgDnaseDukeFibroblUniPk.narrowPeak.cut.merge
chr9 22103340 22103341 rs1537373 chr9 22103265 22103415 16
ENCODE_DNASE_FOR_PCA_PLOT/wgEncodeAwgDnaseDukeFibropUniPk.narrowPeak.cut.merge
ENCODE_DNASE_FOR_PCA_PLOT/wgEncodeAwgDnaseDukeGlioblaUniPk.narrowPeak.cut.merge
ENCODE_DNASE_FOR_PCA_PLOT/wgEncodeAwgDnaseDukeGm12891UniPk.narrowPeak.cut.merge
...

No comments:

Post a Comment