LotuS pipeline

October 2018

!Important information!

LotuS v1 is discontinued. Please use LotuS2 instead: LotuS2 website

LotuS offers a lightweight complete 16S/18S/ITS pipeline to

Demultiplex and filter fasta or fastq sequences
Denoise, remove chimeric sequences and cluster sequences into very high quality OTUs that perform at a similar level to mothur / dada2
Determine taxonomic origin of each OTU using >5 spezialized and general purpose database or statistical algorithms
Construct OTU, genus, family, class, order and phylum abundance tables in .txt or .biom format
Reconstruct OTU phylogenetic tree

while being the fastest pipeline currently available. This makes it possible to easily analyze hiSeq amplicon data on a laptop, by any researcher.
LotuS is aimed at scientists and bioinformatician that want a simple pipeline that is streamlined to a core functionality of creating OTU and taxa abundance tables, at very fast speeds (e.g. processing an 8GB 16S miSeq run takes ~ 30 min on a laptop). LotuS does not include numerical analysis of samples, instead we designed LotuS output to be easily integrateable into existing workflows in e.g. statistical programming languages like R, QIIME/mothur or Matlab.
sdm is part of LotuS, but can be used in independently to demultiplex or just quality filter sequences (e.g. also for assemblies etc.). Several quality filtering tests are included and sequences can be truncated based on accumulated error rates or quality windows falling below a threshold. It is implemented in C++ and optimized for speed.

LotuS workflow

Advantages of LotuS

Simple installation and updates of pipeline with installation script, no system variables need to be modified. One command executes pipeline.
Fast: ~ 2 min for 454; ~ 45 min for MiSeq paired end (one full sequencer run each).
State of the art chimera checking and denoising of OTUs, while keeping high quality full length reads for taxonomic classification and phylogenetic reconstruction.
More: Can retrieve up to 19% more reads from your sequences, compared to other pipelines.
Versatile: Works with ITS/SSU/LSU amplicons, has 3 different cluster algorithms and by default 8 different ways to assign a taxonomy to OTUs - all set with a switch in a flag.
Standardized: Straightforward integration with common numerical ecology software.

If you want to know more details about the algorithm, please see the LotuS publication.
Also see comparative papers including ITS data.

LotuS developments since publication

highmem mode which is for small datasets ~ 100% faster and for large datasets up to 1000% faster
direct mapping from clustering to OTU membership
sdm IO reduction and more reliable format conversions
more stringent quality filtering using probabilistic filters
more taxonomic assignment options (utax, several new databases)
various smaller improvements to existing subroutines, output and log files
Integration of alternative faster mapper: lambda
Two alternative OTU clusterings added: swarm for high-definition clusters and a standing classic in the field: cd-hit
Support for LSU and ITS amplicons, with specific quality controls specific to these amplicons (e.g. ITSx)
PacBio support
Multiple databases, either general (RDP, Silva, greengenes) or more specific (UNITE for Fungi, PR2 for unicellular Protists, HITdb for human gut, beeTax for bee gut) and support for custom user databases.

If you have question about LotuS, visit the LotuS webforum

News and updates

!Important information!

LotuS v1 is discontinued. Please use LotuS2 instead: LotuS2 website

Follow me on twitter for LotuS update announcements: @Falk_tw

24th Jan 2020: LotuS 1.62.1/sdm 1.50: Updated autoInstall.pl to integrate SLV 138 version. Our tests show that species level is much more often reached with this database in OTU assignments.
12th April 2019: LotuS 1.62/sdm 1.50: Compatibility to usearch v11 with all it's glories and fallacies. Also included cross talk algorithm to clean up OTU matrix.
27th March 2019: LotuS 1.611/sdm 1.49: Updated error reporting in sdm.
19th March 2019: LotuS 1.61/sdm 1.47: Updated default UNITE database (thanks to UNITE team). Integrated PacBio support (option -p PacBio). It is basic, therefore use only with CD-HIT clustering). Further improved sdm, it can now directly output gzipped files.
12th June 2018: LotuS 1.60/sdm 1.46: updated to Silva 132, to use you need to reinstall the databases using ./autoinstall.pl (after updating lotus). Further added the option "-TaxOnly", that will only run a taxonomic classification on a fasta file (could be an OTU fasta). Use together with "-i" to set path to fasta. sdm 1.46: Complete rewrite of the read count stats (more precise, the old system sometimes had wrong error reporting). Speed improvements and gzip output file support for sdm. Further, counts of failed reads per sample file support.
13th April 2018: LotuS 1.59/sdm 1.41: Added the feature to automatically check ITS OTU's for correct ITS sequences via ITSx. You will need to do a full lotus reinstall AFTER you update to 1.59, to install ITSx.
Further, LCA identity cutoff threshholds changed to Konstantinidis, et al. ISME 2017.
5th Dec 2017: LotuS 1.58/sdm 1.41: Included a new default ref taxonomy database (-refDB beetax). I have to thank my collaborators Phillip Engel and Julia Jones to painstackingly handcraft this and make it now available to the community.
28th Jul 2017: LotuS 1.57/sdm 1.41: Introduction of the greengenes mapping mode (-greengenesSpecies) and the more generally applicable pseudo reference OTU calling (-pseudoRefOTUcalling). Both these modes are designed to work like a open reference OTU calling, just that the OTU calling is still deNovo and these novel OTUs are then assigned to references in databases, if fitting. In the end it is very similar to an open reference OTU calling, appliable to different databases.\nThe focus on greengenes is simply, to enable the usage of picrust and similar programs, where we now provide tutorials to explain how to integrate the LotuS output with these packages. Also added some small fixes related to 454 demultiplexing.
Further inserted support for usearch10, in general I recommend this version of previous usearch verisons.
12th May 2017: LotuS 1.565/sdm 1.41: Fix for a bug introduced in 1.564/sdm 1.40. Please do not use sdm1.40, if you had more than one sequencing run (but runs paired end), the last sample of the previous run was overcounting the abundance of OTUs for this sample.
9th May 2017: LotuS 1.564/sdm 1.4: Fixed a bug with default rdp classifications, that was using blast tax assignments instead (the slowest tax assignment option in lotus). Big update to sdm, that increases speed a little, switched completely the internal memory model reducing memory usage by 20-30 percent in my tests and is more stable for double barcoded demultiplexing.
Also note the new tutorial for diversity analysis in R.
26th Apr 2017: LotuS 1.563/sdm 1.37: Fixes to sdm dual barcoding demultiplexing and significantly increased sdm speed for pure demultiplexing mode, if a large number of samples is being used (>250). Also rtk (a rarefaction and diversity estimator) will be installed, and better integrated in later releases.
17th Mar 2017: LotuS 1.562/sdm 1.36: Small fixes to sdm and LotuS. However, the autoinstall was updated, so Silva rel 128 is now the standard in lotus as well as updated swarm and cd-hit binaries. I recommend to update the lotus silva database by running ./autoinstall.pl after the update.
21st Feb 2017: LotuS 1.561/sdm 1.35: Added option to write all demultiplexed reads out, not just quality filtered. This is now the default for -saveDemultiplex .
20th Feb 2017: LotuS 1.56/sdm 1.34: Improved the -saveDemultiplex option: it will now create a "demultiplexed" directory within the output dir, containing a single fastq for each sample in the mapping file.
9th Dec 2016: LotuS 1.552/sdm 1.33: Critical bug fix for sdm: ca. 20% of reads were wrongly assigned in specific subcases of double barcoded demultiplexing. This mostly occured if only a subset of sequenced samples was extracted during the demultiplexing (by e.g. leaving valid double barcodes out of the mapping file). If you have double barcoded data, please rerun your dataset, to be on the safe side.
22th Nov 2016: LotuS 1.551/sdm 1.32: Several stability updates to sdm & some small speed improvements as well as under the hood handling of changed data formats for within pipeline processing.
13th Oct 2016: LotuS 1.55/sdm 1.29: Usearch 9 support. This includes an upgraded uparse algorithm, as well as uchime2 de novo and referenace Chimera finder. Find the paper here. Additionally the integration of usearch was made easier, with the autoinstaller asking directly for the usearch location or using the syntax: ./autoInstall.pl -link_usearch [path to usearch9]
9th Sep 2016: LotuS 1.53/sdm 1.29: Fix for cases when you want to use custom databases. sdm can now handle corrupted fastq entries in files better, and has an option to just demultiplex files (-o_demultiplex). Also updated Silva, RDP classifier and vsearch to the latest version in the installer download.
1st Jul 2016: LotuS 1.52/sdm 1.28/LCA 0.16: Fix that caused sdm under certain circumstances to break, if empty lines were present in a fq file. Fix for the LCA algorithm, to read inconsistent entries from Silva databases in a consistent manner.
1st Jul 2016: LotuS 1.512/sdm 1.27: Small fix to .biom format that made the biom table unreadable, if unknown taxa were present in combination RDP classifier.
19th Jun 2016: LotuS 1.51/sdm 1.27/LCA 0.15: Improved LCA precision, if more than one database is used as reference by including information on %id, when conflicting hits are found.
28th Apr 2016: LotuS 1.506/sdm 1.27: Fixed a bug in biom tables, that occured if RDP taxonomy was used. This bug occured since version 1.50.
07th Apr 2016: LotuS 1.504/sdm 1.27: Removed message for 'biom file can not be created due to combined samples', that was in some cases falsely triggered. More importantly, updated the autoinstall to re-execute if the script itself was updated (you still have to run it twice one last time with: './autoInstall.pl -forceUpdate').
04th Apr 2016: LotuS 1.502/sdm 1.27: Fixes for new LCA program, related to a) autoinstaller b) .biom and c) legacy LCA Perl code.
01st Apr 2016: LotuS 1.50/sdm 1.27: Rather big update that brings a C++ implementation of the LCA (least common ancestor) algorithm to assign a taxonomy based on similarity searches against various databases. This mainly improves speed, but has also a better selection of best tax, when several ref DBs are used. Further includes two fixes to sdm and one fix to the LotuS phiX contaminant filter and a new output format for sdm.
19th Mar 2016: LotuS 1.47/sdm 1.26: Fix for .biom files in combination with the mapping file option of combined samples and some code maintenance (prettier stats, preparation for a new C++ implementation of an important algorithm so far implemented in Perl .. the LCA algorithm). Also an update to autoMap.pl. Thanks to Damian Plichta for reporting the combineSamples bug.
13th Feb 2016: LotuS 1.462/sdm 1.26: Bugfix for the LCA (had a chance with 1.461 to undervalue LCA fraction parameters) and display options.
13th Feb 2016: LotuS 1.461/sdm 1.26: Tweak the the LCA used in double database mode, to assign a higher fraction of OTUs with a taxonomy.
5th Nov 2013: First alpha version of LotuS is online (linux only)!
This version is LotuS 0.77 with sdm 0.68. hiSeq, miSeq and 454 sequences are supported.