Caravan is a fork of SmileTrain that I made in order to
- rework the command-line interface
- change in the intermediate file types (into yaml)
- change the submission style (to explictly use ssub)
- move simple functions to quick perl tools
Caveat: This readme has fallen a little behind, especially when it comes to the initial data format and the possibilities for merging.
- caravan is developed against Python 3.4 and Perl 5.10.1.
- Biopython
First, you'll need to have your data in a standard input format:
- All fastq files must be in Illumina 1.8+ format (also known as Sanger, Phred+33, or ASCII offset 33). You can convert fastq files in older Illumina format (i.e., Phred+64) using
van.py convert
. - Intersecting multiple files requires that they have IDs like
readXXX
, whereXXX
are monotonically increasing integers. You can get your fastq files into this format usingvan.py convert --rename
. - When dereplicating, if you want provenience data (i.e., to make an OTU table), then the reads must have IDs with a
sample=XXX
field. A field is separate from other parts of the ID my semicolons. The preferred read ID format is likeread1234;sample=sample1
.
Newer versions of caravan use the new three file Illumina format: forward reads in one fastq, reverse reads in a second, and the index reads (aka "barcode reads") in a third.
- Trim primers from the forward and reverse fastq's (using
van.py primer
). - Demultiplex the index reads (using
van.py demultiplex
). - Intersect the forward, reverse, and mapping information (using
van.py intersect
). - Merge the forward and reverse reads (using
van.py merge
). - Quality filter the merged reads (using
van.py filter
). - Dereplicate and provenance (using
van.py derep
). - Make OTUs.
If you do not have paired-end reads, you'll probably want to use van.py truncate
to
trim sequences by their quality or length.
Note that, if using van.py rdp
on a reverse read, you will want to use the --antisense
option. Caravan throws out RDP classifications made using reverse complements because, if
we don't expect a reverse complement, it's probably a bad classification.
There is sparse documentation. Sorry. But the --help
option is usually pretty explanatory.
Unit tests are in the test
folder. You can run the tests from the top directory
by running py.test
or make
.
Unit tests are mostly incomplete. Sorry. I'm a terrible programmer.