caravan

Caravan is a fork of SmileTrain that I made in order to

Caveat: This readme has fallen a little behind, especially when it comes to the initial data format and the possibilities for merging.

caravan requirements

First, you'll need to have your data in a standard input format:

All fastq files must be in Illumina 1.8+ format (also known as Sanger, Phred+33, or ASCII offset 33). You can convert fastq files in older Illumina format (i.e., Phred+64) using van.py convert.
Intersecting multiple files requires that they have IDs like readXXX, where XXX are monotonically increasing integers. You can get your fastq files into this format using van.py convert --rename.
When dereplicating, if you want provenience data (i.e., to make an OTU table), then the reads must have IDs with a sample=XXX field. A field is separate from other parts of the ID my semicolons. The preferred read ID format is like read1234;sample=sample1.

There is sparse documentation. Sorry. But the --help option is usually pretty explanatory.

Unit tests are in the test folder. You can run the tests from the top directory by running py.test or make.

Unit tests are mostly incomplete. Sorry. I'm a terrible programmer.