qiime2/docs

Add MultiplexedBarcodeInSequence to import tutorial

Closed this issue · 1 comments

Addition Description
Add MultiplexedSingleEndBarcodeInSequence and MultiplexedPairedEndBarcodeInSequence to import tutorial.

Current Behavior
MultiplexedBarcodeInSequence formats are not represented in the importing tutorial at this time.

Proposed Behavior
Adds them.

Questions

  1. Title: "Multiplexed FASTQ with Barcode in Sequence"?
  2. Sample Data: Am I OK borrowing the (tiny) sample data from q2-types/q2_types/multiplexed_sequences/tests/data? It only has three records, but forward and reverse both exist. Alternatively, I could steal forward-only data from @thermokarst's tutorial Can we get away with sample data for single-end only, and sample commands for both.
  3. Ordering is an expectation in EMP fastq files, barcodes can be matched to reads. That seems not to be the case here, or at least not to be important. The barcodes.tsv doesn't include a barcode for every record, right? Just a single listing of each barcode?
  4. Is the naming convention forward.fastq.gz required for these imports? Only for paired-end, or for both?

References

  1. Forum X-refs (1, 2):

Replies to questions:

  1. LGTM
  2. Feel free to use the q2-types test data.
  3. Ordering is critical for paired-end reads - the order of the forward file must match the order of the reverse file, otherwise there is no way to register the read-pair as related. Regarding the sample metadata file, correct, this is just a column with a barcode (or two columns of barcodes if dual-index reads). This is the same as the EMP case, as well.
  4. In QIIME 2 any single-file directory format can be created by importing any arbitrarily named file. Filenames become critical when importing a multi-file directory format - the names of the files must be precisely what the format defines. In this case, the paired-end mux fmt should be named forward.fastq.gz and reverse.fastq.gz.

Let me know if you have any questions - thanks for looking into this!