Thursday, April 28, 2016

How to search barcoded files using first two columns in the barcode list and modify their names with third column

Continuing on the previous post.
Lets say you have barcodes from a sequencing experiment in a file and you want to find files that correspond to those barcodes and modify their names. You will use awk to print columns with barcodes separated with a symbol in between that is present in the actual file names (in this case -) and pipe it to xargs and find to list files.

mpjanic@zoran:~$ cat tmp.tmp
ATCG GCTA first
GGGG CCCC second
CCCC TTTT third
ATAT CGCG fourth
mpjanic@zoran:~$ ls -ltrh | tail -n4
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 16:57 tmpATCG-GCTAtmp
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 16:57 tmpGGGG-CCCCtmp
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 16:57 tmpCCCC-TTTTtmp
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 16:57 tmpATAT-CGCGtmp
Write a short script where you will grab the output of the awk/xargs/find command for each line in your input file. Use another awk to grab 3rd column from the input to specify new filename:

#!/bin/bash
while read -r line; do
cp $(echo $(awk '{print $1"-"$2}' <<<$line | xargs -I % sh -c 'find . -name "*"%"*" -exec ls -1 {} \;;') $(awk '{print $3}' <<<$line).filename)

done < tmp.tmp
Check if the output files are present:

ls -ltrh | tail -n4
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 18:06 first.filename
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 18:06 second.filename
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 18:06 third.filename
-rw-rw-r--  1 mpjanic mpjanic     0 Apr 28 18:06 fourth.filename


1 comment: