LOTUS USAGE EXAMPLES

Februar 2014

Usage Example 1: 454 sequencing

Assume that the output of two separate 454 runs are stored in
/Users/Tomas/data
and that the mapping file was formatted to include the fasta and quality file associated to each barcode (as described in the tutorial).
LotuS can be started using the following command:
perl lotus.pl -i /Users/Tomas/data -m /Users/Tomas/data/map.txt -o /Users/Tomas/results/example
Note that in this case the default sdm quality filtering options will be used, which is not recommended. The sdm options file can be modified to the specific user requirements and supplied via:
perl lotus.pl -i /Users/Tomas/data -m /Users/Tomas/data/map.txt -o /Users/Tomas/results/example -s /Users/Tomas/data/example_opt.txt

Usage Example 2: miSeq paired-end sequencing

The -barcode option is implemented specifically for miSeq and hiSeq paired end preprocessed sequencer output files, which consists of three files:
  • the forward and reverse paired sequences which can be passed to LotuS using the "-i fwdPair.fastq,revPair.fastq" command
  • one file with the barcodes separated from the sequences, this file is passed to LotuS using the "-barcode barcodes.fastq" command
LotuS will take care of further processing of these files from here including merging of the paired reads and can be started using the following command:
perl lotus.pl -i /Users/Tomas/data/fwdMiSeq.fastq,/Users/Tomas/data/revMiSeq.fastq -barcode /Users/Tomas/data/MIDs.fastq -m /Users/Tomas/data/map.txt -o /Users/Tomas/results/example -s /Users/Tomas/data/sdm_miSeq.txt
Please note, that I recommend to let LotuS handle the read merging, as the reverse read has on average ~5 lower quality and needs to be treated with care. However, LotuS was adapted to this and tries to get the maximum phylogenetic information per OTU, merging the second read to the first after high quality OTUs were built from the first, high quality read.
You can observe this difference in quality in the lotus log files on quality distribution, that separate first from second read.

Usage Example 3: raw sequence files demultiplexed into single files

In case the fasta / fastq files from the sequencing facility are already demultiplexed into single files, LotuS can still process these by setting up the mapping file. Basically we want our samples to have a specific name, but the Barcodes has been removed and one file represents each Sample. The mapping file should look like this:
#SampleID	BarcodeSequence	LinkerPrimerSequence	fastqFile	Description
bl9			file_with_bl9.fastq	FVB
bl10			file_with_bl10.fastq	FVB
...
bl36			file_with_b36.fastq	FVB

Note that "BarcodeSequence" and "LinkerPrimerSequence" are just empty fields, separated by a tab character. It is important that they are just empty so LotuS doesn't start looking for e.g. a Space character (" ") as Primer or Barcode. Also, check that in the sdm_XX.txt (sdm option file) the following key is set to F (False): RejectSeqWithoutFwdPrim F.
During LotuS.pl class the absolute or relative path to the folder with all the file_with_bl9.fastq.. files in it is given as -i argument.
In case paired end files are being used, the column fastqFile has to contain the comma-separated two fastq file. I.e. instead of "file_with_bl10.fastq" this would be the two paired end files: "file_with_bl10.1.fastq,file_with_bl10.2.fastq" Same applies for separate fasta and quality files and their respective columns (separate pair1,pair2 by comma).
perl lotus.pl -i /Users/Tomas/data/dir_with_single_files/ -m /Users/Tomas/data/map.txt -o /Users/Tomas/results/example -s /Users/Tomas/data/sdm_XX.txt

Usage Example 4: demultiplex with fasta/fastq header

In some cases (especially short read archieves), the sample identity is stored with a small string in the fasta or fastq header. If this string is known (and this string is uniquely identifiable), the mapping file can be configured to split the sequences by an ID within their header:
#SampleID	BarcodeSequence	LinkerPrimerSequence	ReversePrimer	SampleIDinHead	Description
bl9		GTGCCAGCAGCCGCGGTAA		bl9ID882	FVB
bl10		GTGCCAGCAGCCGCGGTAA		bl10IDXXY	FVB
...
bl36		GTGCCAGCAGCCGCGGTAA		some_other_string	FVB

and execute lotus:
perl lotus.pl -i /Users/Tomas/data/data/Reads.fastq -m /Users/Tomas/data/map.txt -o /Users/Tomas/results/example -s /Users/Tomas/data/sdm_XX.txt

Usage Example 5: Only sequences without quality information available

I do not recommend this

I.e. the richness and diversity estimates will be unreliable. Composition itself should be fine, but I never investigated this in detail. Since some databases (MG-RAST) do in some cases not supply quality files, this option is available in LotuS, but you have to proceed at your own risk.
The setup is very similar to all the examples above, but the quality file destination is left empty. E.g. lotus.pl -i XX.fna or indicate in the mapping file the file location for single fna file (column header "fnaFile") and leave the "qualFile" column empty.

Usage Example 6: Eukaryotic LSU amplicons

Adapt the demultiplexing as decribed above and then execute LotuS with the following added commands:
perl lotus.pl -i .. -m .. -o .. -s .. -amplicon_type LSU -tax_group fungi
Note that this will automatically try to switch to Silva as ref DB, if -simBasedTaxo 1 or 2 was added.
Note that using -simBasedTaxo 1/2 you can classify much more than just fungi (although it say tax_group fungi). The restriction is from RDP, that only has fungal LSU training sets included, but this is overriden by Silva annotations in the described case.

Usage Example 7: Fungal ITS amplicons

As example 6, but:
perl lotus.pl -i .. -m .. -o .. -s .. -amplicon_type ITS

See more advanced options for LotuS usage in the commandline documentation.
Learn more about setting up the quality filter for you reads in the sdm configuration.