Exports

Intro

In case your session comprises more than one sample, the 'Reports' page will also provide exports. They are displayed in the upper right corner only after all files are processed. Each export summarizes all samples.

Handling

VCF ( VCF ) : All samples are merged into one vcf file (by VCFtools: vcf-merge).

VCF-stats ( Stats ): (also by VCFtools: vcf-stats) provides some statistics.

VCF-tab ( TSV ): It is then "flattened" into tab-delimited text format (also by VCFtools: vcf-to-tab, displaying max. two alleles), and further into a

FASTA ( Fasta ): file, in which each sample is represented by one sequence ( vcf_tab_to_fasta_alignment.pl by Christina Bergey, with only one allele at each position). Both VCF-tab and FASTA comprise only SNP genome positions. Moreover, the fasta file is produced by ignoring all multi-allele positions at the moment. We currently work on decomposing mixed samples into separate (component) sequences, also estimating their shares (percentage of whole sample except contamination reads).

Tree (): Last, a maximum likelihood tree is generated from the fasta file by FastTree and rendered by jstree. The latter enables to

  • re-root
    Click on the tree figure, uncheck "Circular", "Draw tree", pick new root (e.g. REF) by clicking on the blue square (next to REF), "reroot" (both Newick tree in text window and the graph are updated), click on tree and re-check circular. You can also reroot the circular tree but here the nodes are more difficult to select (i.e. to precisely hit a particular node by mouseclick).
  • highlight
    To highlight samples click on the tree, enter search string and click "Search". All sample names comprising an entered substring will be highlighted.
  • style
    To style the figure, click on the tree figure to adapt figure width, fontsize, and spacing.
  • remove
    To remove samples or whole subtrees click on the blue box (after un-checking "Circular"). You can also collapse, ladderize, swap, multifurcate, move, or color-code any subtree. As above it is also possible to do all this in circular mode, just that the nodes (no blue boxes here) are more difficult to hit by mouseclick.
  • edit manually
    Edit the Newick tree in the text window, click on the graph, "Draw tree" (or copy-paste the generated Newick tree into a more powerful tree viewer).
  • also on a train/plane without internet
    Save the HTML page and unpack jstree.zip in the same directory.

Variants (): All samples' variants in one comma-separated spreadsheet (import into e.g. Excel). It also contains base counts, i. e. occurances of all nucleotides reliably observed (with base quality > 13) at the particular position. For unfiltered base counts see the VCF.

Computational steps


   # code snippet producing all exports (embedding Newick tree in HTML for visualization by java script)

   while ($file=<$ARGV[0]*.bam.flt.vcf>){ # feed all single-sample vcf files
    $res=`/usr/bin/bgzip -c $file > ${file}.gz`; print "$res\n";
    $res=`/usr/bin/tabix -p vcf ${file}.gz`; print "$res\n";
    $filecount++; $files=$files.${file}.'.gz ';
   }
   if ($filecount>1){ # merge and tree from min 2 files
    print "summarizing (exports and tree)\n";
    $res=`/usr/bin/vcf-merge ${files} > $ARGV[0]export.vcf`; print "$res\n";
    $res=`/usr/bin/vcf-stats $ARGV[0]export.vcf > $ARGV[0]export.stats`; print "$res\n";
    $res=`/bin/cat $ARGV[0]export.vcf | /usr/bin/vcf-to-tab > $ARGV[0]export.tab`; print "$res\n";
    $res=`/usr/bin/vcf_tab_to_fasta_alignment.pl --exclude_het --output_ref -i $ARGV[0]export.tab > $ARGV[0]export.fa`; print "$res\n";
    $res=`/usr/bin/FastTreeMP -nt -quiet $ARGV[0]export.fa`;
    $res=~s/,/,\n/g; $res=~s/\)/\n\)/g; # break into many lines for easy copy/paste
    open(Fout,"> $ARGV[0]export.html");
    print Fout <<"END";
  <html>
  [...]
  <textarea id="nhx-ex" style="display: none">
  END
    print Fout $res;
    print Fout '</textarea></body></html>'."\n\n";
    close Fout;
   }

  
Versions
VCFtools v0.1.13
tabix v1.7.2
vcf_tab_to_fasta_alignment.pl by Christina Bergey (Bergey CM (2012). vcf-tab-to-fasta; http://code.google.com/p/vcf-tab-to-fasta)
FastTree v2.1.10 as multi-threaded executable (+SSE +OpenMP)
jstree (no versioning found, from http://lh3lh3.users.sourceforge.net/jstree.shtml)