Friday, October 9, 2015

How to count number of sequences in a fastq file

To count the number of sequences in a fastq file with a structure where a sequence is in every fourth line

@NS500418:109:H5YLYBGXX:1:11101:17977:1045 2:N:0:ATTACTCG+TAAGATTA
AGCCCTCCTACGCAGATGCTGTCACTGGCAGAGCACAGCCCACG
+
)AAAAFAFFAFFFFFAFFFFFFFAFFFFFFFAFFAFAFFF<)FF
@NS500418:109:H5YLYBGXX:1:11101:19117:1046 2:N:0:ATTACTCG+TAAGATTA
CGCTTTTGGCGACAGACTTGAAGAATTCAGATCTATAAATACTGAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
<AAAAFFFFFF.FFF.FFAFAFF<FAFFFFFFA7FAFFFFAFFFF7#########################################################################################################
@NS500418:109:H5YLYBGXX:1:11101:10072:1046 2:N:0:ATTACTCG+TAAGATTA
ATTGAAACTGTGGGAGGTGTCATGACCAAACTGATTCCAAGGAACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
<AAAAFFFFFFFFFFFFFAFFFAF7FFFFFFFFAFFFFFFFFFAFF#########################################################################################################
@NS500418:109:H5YLYBGXX:1:11101:9820:1047 2:N:0:ATTACTCG+TAAGATTA
GGATTCCAGTTCGAGTATGGCGGCCAGGGCTCCGCCCCTGCCGAT
+
AAAAAFFFFFFFFFFFFFFFFFFFFFFFFFFFFF)FFFFFFFF.F
@NS500418:109:H5YLYBGXX:1:11101:15523:1047 2:N:0:ATTACTCG+TAAGATTA
CAGATTCTAGTGCTGAGAAGAAACACGTTTGGTTTGGAGAGTCCATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

+

Do zcat to read gz file, pipe to paste, paste - - - - will concatenate four lines in a one line,, then pipe it to wc -l to count the number of lines

zcat C2_S2_L001_L004_R2_001.fastq.gz | paste - - - - | wc -l



No comments:

Post a Comment