Thursday, March 31, 2016

Use ls to grab multiple paired file names, assign them to variables using nested for loops - example of fastq paired files

Lets say we have a set of files with similar names, for example, a set of paired fastq files that are marked with a barcode.

Lets assume we want to use these files in pairs in a for loop and assigned their full path to the variables Reads1 and Reads2 and execute some code for each pair (in the example bellow just echo $Reads1 and echo $Reads2).

Make an array variable with barcodes that will be used to identify files. Create a for loop that will span the range of the array variable. Make another for loop that goes from 1 to 2 to grab each file of the pair.

Within the second loop - use bash ls command to get the full name of a file,  Greb the output of ls in a variable (tmp). Use export command to assign tmp to Reads1 and subsequently to Reads2.  Use Reads 1 and Reads2 (here it was just echo or e.g. map the fastqs to the genome after the loop is done).

The advantage here is that you don't have to write full file names, you can just find and greb them with their respective barcodes.


a=(TTAGGC ACTTGA ACAGTG AGTCAA ATCACG AGTTCC CAGATC ATGTCA CCGTCC CGATGT CTTGTA GATCAG GGCTAC GTCCGC GTGAAA TAGCTT TGACCA);

for i in $(seq 0 16);
do
for j in $(seq 1 2);
do

tmp="$(ls /home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/*${j}_${a[$i]}*)"
export "Reads${j}=$tmp"

done
echo $Reads1
echo $Reads2
echo done
done
Output:

mpjanic@valkyr:/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data$ source tmp2
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1401.1_26437_merged_1_TTAGGC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1401.1_26437_merged_2_TTAGGC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2315_26438_merged_1_ACTTGA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2315_26438_merged_2_ACTTGA.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/59386143_26427_merged_1_ACAGTG.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/59386143_26427_merged_2_ACAGTG.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2115_26430_merged_1_AGTCAA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2115_26430_merged_2_AGTCAA.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/9071501.8_26436_merged_1_ATCACG.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/9071501.8_26436_merged_2_ATCACG.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1483_26431_merged_1_AGTTCC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1483_26431_merged_2_AGTTCC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1060602_26428_merged_1_CAGATC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1060602_26428_merged_2_CAGATC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1795_26432_merged_1_ATGTCA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1795_26432_merged_2_ATGTCA.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2108_26433_merged_1_CCGTCC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2108_26433_merged_2_CCGTCC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/59885590_26425_merged_1_CGATGT.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/59885590_26425_merged_2_CGATGT.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/200212_26429_merged_1_CTTGTA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/200212_26429_merged_2_CTTGTA.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/102901.8_26439_merged_1_GATCAG.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/102901.8_26439_merged_2_GATCAG.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/24635_26441_merged_1_GGCTAC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/24635_26441_merged_2_GGCTAC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/3101801.2_26434_merged_1_GTCCGC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/3101801.2_26434_merged_2_GTCCGC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1522_26435_merged_1_GTGAAA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1522_26435_merged_2_GTGAAA.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1587_26440_merged_1_TAGCTT.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1587_26440_merged_2_TAGCTT.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/8072501_26426_merged_1_TGACCA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/8072501_26426_merged_2_TGACCA.fastq.gz

No comments:

Post a Comment