Monday, May 25, 2015

Turning off Illumina specific FASTQ encoding

One of the common errors in the code for BWA mapping is -I option that will make it use older Illumina specific FASTQ encoding instead of Sanger style FASTQ that is now used by Illumina pipes.
If you try to use Picard to convert SAM to BAM afterwards you will get this error:

java -Xmx4g -Djava.io.tmpdir=/tmp \
-jar /usr/bin/picard.jar SortSam \
SO=coordinate \
INPUT=141125_H22TJ_2305_3_L005.sam \
OUTPUT=141125_H22TJ_2305_3_L005.bam \
VALIDATION_STRINGENCY=LENIENT \
CREATE_INDEX=true

picard.sam.SortSam INPUT=141125_H22TJ_2305_3_L005.sam OUTPUT=141125_H22TJ_2305_3_L005.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true    VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
[Sat May 23 01:50:25 PDT 2015] Executing as mpjanic@valkyr on Linux 3.13.0-44-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14; Picard version: 1.129(b508b2885562a4e932d3a3a60b8ea283b7ec78e2_1424706677) IntelDeflater
Ignoring SAM validation error due to lenient parsing:
Error parsing text SAM file. length(QUAL) != length(SEQ); File 141125_H22TJ_2305_3_L005.sam; Line 96
Line: ST-E00136:51:H22TJCCXX:5:1101:2502:1783   77      *       0       0       *       *       0       0       AATAAACCTCTCCTCCCCGGCGCCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC
ANNANGCGANAGGGCAGAGACAGAGTGGGGGCGGAGCAGGT     "'''","'""""'"
[Sat May 23 01:50:25 PDT 2015] picard.sam.SortSam done. Elapsed time: 0.00 minutes.


So remove -I option from bwa code, and it should be fine.

No comments:

Post a Comment