Platforms supported

Intro

The vast majority of Mycobacterium tuberculosis NGS data published so far stems from the Illumina platform. As of 25 March 2015, the European public repository (ENA) held only 40/11,027 runs associated with this taxon that are measured with other techniques (see here for new numbers). Thus, we started with Illumina covering 99.6% of the demand, already. Meanwhile, PhyResSE can also process Ion Torrent data (platform selection in the top right corner of uploaded data list).

Handling

Please select your platform (Illumina or Ion Torrent, on Upload page right from "Process files"), before uploading the data. Data stemming from Ion Torrent will not upload (failing the FASTQ validation) as Illumina data.

  • Illumina
    This is the default. Illumina handling is detailed throughout this docu. All deviations from this default are described here on this page:

  • Older Illumina machines
    Illumina 1.5 and older (1.3) encode base qualities ranging from 66 (64) to 104. This is automatically detected and re-encoded to fit the new Illumina (1.8 and 1.9) range (33..94) which resembles the one of good old Sanger Phred scores (33..73).

  • Ion Torrent
    Ion Torrent data suffer from many false positive small (one-base) indels [pub]. This is a lesser problem than it may seem because at the very moment, PhyResSE entirely relies on SNPs. Thus, for simplicity, and until further notice, no indels are reported for Ion Torrent data at all.

Computational steps


 Older Illumina
 ==============

 Only one difference to Illumina (default):

 * If Fastqc-generated file fastqc_data.txt reports any encoding other than
   "Sanger / Illumina 1.9":

   mv ${dir}${file} `echo ${dir}.${file}|sed "s/fastq/NON_PHRED33_ORIGINAL_FASTQ/"`
   java -jar trimmomatic-0.33.jar SE `echo ${dir}.${file}|sed "s/fastq/NON_PHRED33_ORIGINAL_FASTQ/"` ${dir}${file} TOPHRED33



 ION TORRENT
 ==========

 Only three differences to Illumina (default):

 * Upload: fastQValidator needs to be run with --minReadLen 1
           Ion Torrent data fail the default test. Because fastQValidator is called
           during the upload procedure, the platform needs to be selected before upload.

 (* Mapping: IndelRealigner needs to be run with --defaultBaseQualities 12
             because some qualities are missing. They are conservatively assiged
             a bad quality (12). This is acutally no real difference, because
             this option was added to the metafile governing all platforms, as
             displayed here. For other (Illumina)
             data, however, this has no effect because here qualities are never
             missing (always defined).)

 * Variants: Differing from here, the GATK UnifiedGenotyper
             is called with -glm SNP instead of -glm BOTH to exclude all indels. This is
             not intended to be written in stone, comments / better recipies welcome.