Add MultiplexedBarcodeInSequence to import tutorial
Closed this issue · 1 comments
ChrisKeefe commented
Addition Description
Add MultiplexedSingleEndBarcodeInSequence
and MultiplexedPairedEndBarcodeInSequence
to import tutorial.
Current Behavior
MultiplexedBarcodeInSequence
formats are not represented in the importing tutorial at this time.
Proposed Behavior
Adds them.
Questions
- Title: "Multiplexed FASTQ with Barcode in Sequence"?
- Sample Data: Am I OK borrowing the (tiny) sample data from q2-types/q2_types/multiplexed_sequences/tests/data? It only has three records, but forward and reverse both exist. Alternatively, I could steal forward-only data from @thermokarst's tutorial Can we get away with sample data for single-end only, and sample commands for both.
- Ordering is an expectation in EMP fastq files, barcodes can be matched to reads. That seems not to be the case here, or at least not to be important. The barcodes.tsv doesn't include a barcode for every record, right? Just a single listing of each barcode?
- Is the naming convention
forward.fastq.gz
required for these imports? Only for paired-end, or for both?
References
thermokarst commented
Replies to questions:
- LGTM
- Feel free to use the q2-types test data.
- Ordering is critical for paired-end reads - the order of the forward file must match the order of the reverse file, otherwise there is no way to register the read-pair as related. Regarding the sample metadata file, correct, this is just a column with a barcode (or two columns of barcodes if dual-index reads). This is the same as the EMP case, as well.
- In QIIME 2 any single-file directory format can be created by importing any arbitrarily named file. Filenames become critical when importing a multi-file directory format - the names of the files must be precisely what the format defines. In this case, the paired-end mux fmt should be named
forward.fastq.gz
andreverse.fastq.gz
.
Let me know if you have any questions - thanks for looking into this!