Thursday, April 28, 2016

How to search barcoded files using a barcode list

Lets say you have barcodes from a sequencing experiment in a file and you want to find files that correspond to those barcodes. You will use awk to print columns with barcodes separated with a symbol in between them that is present in the actual file names (in this case -) and pipe it to xargs and find to list files:


mpjanic@zoran:~$ cat tmp.tmp 
ATCG GCTA
GGGG CCCC
CCCC TTTT
ATAT CGCG
mpjanic@zoran:~$ ls -ltrh | tail -n4
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 16:57 tmpATCG-GCTAtmp
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 16:57 tmpGGGG-CCCCtmp
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 16:57 tmpCCCC-TTTTtmp
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 16:57 tmpATAT-CGCGtmp
mpjanic@zoran:~$ awk -F"\t" '{print $1"-"$2}' tmp.tmp | xargs -I % sh -c 'echo %; find . -name "*"%"*" -exec ls -l {} \;;'
ATCG-GCTA
-rw-rw-r-- 1 mpjanic mpjanic 0 Apr 28 16:57 ./tmpATCG-GCTAtmp
GGGG-CCCC
-rw-rw-r-- 1 mpjanic mpjanic 0 Apr 28 16:57 ./tmpGGGG-CCCCtmp
CCCC-TTTT
-rw-rw-r-- 1 mpjanic mpjanic 0 Apr 28 16:57 ./tmpCCCC-TTTTtmp
ATAT-CGCG
-rw-rw-r-- 1 mpjanic mpjanic 0 Apr 28 16:57 ./tmpATAT-CGCGtmp

No comments:

Post a Comment