This workflow is designed to help in the generation of multiple synthetic datasets using aMGSIM. The workflow needs the results generated after processing a BAM file through filterBAM and metaDMG. An example of the config file is here.
The sample TSV file must contain the following fields:
- label: The name of the sample
- short_label: A short name for the sample (if not provided will autogenerate a short name using the MD5 of the label)
- libprep: The type of library preparation to use (single-stranded or double-stranded)
- read_length_freqs_file: A JSON file with the empirical read length frequencies for the sample estimated by filterBAM
- mapping_stats_filtered_file: A TSV file with the filtered mapping statistics generated by filterBAM
- metadmg_results: The TSV generated by metaDMG when analyzing the filtered BAM file using the metaDMG's local mode
- metadmg_misincorporations: The misincorporations generated by metaDMG when analyzing the filtered BAM file using the metaDMG's local mode
Example data can be found here. Once the workflow is done will produce the following folders. In this example, we show the output for the sample Cave-22
, where the workflow tried to produce a maximum of 1000000
single-end reads:
Cave-22/single/1000000
├── Cave-22.communities.json
├── Cave-22.communities.tsv
├── Cave-22.communities_read-abundances.tsv
├── Cave-22.communities_read-files.json
├── Cave-22.filepaths.tsv
├── Cave-22.genome-compositions.tsv
├── ag-config.yaml
├── ar-config.yaml
├── fb-config.yaml
└── reads
├── Cave-22_art.fq.gz
├── Cave-22_deamSim.fa.gz
└── Cave-22_fragSim.fa.gz
The folder contains the generated config files for the aMGSIM estimate
, ancient-genomes
and ancient-reads
programs:
- fb-config.yaml: The config file for the aMGSIM
estimate
- ag-config.yaml: The config file for the aMGSIM workflow for the
ancient-genomes
program - ar-config.yaml: The config file for the aMGSIM workflow for the
ancient-reads
program
Those are the ouput files after running aMGSIM estimate
:
- Cave-22.communities.tsv: This file contains the estimated abundance table for the synthetic community.
- Cave-22.filepaths.tsv: This file contains the file paths for the reference genomes used in the filterBAM analysis.
- Cave-22.genome-compositions.tsv: This file is the one controlling how the genomes are going to be processed and will contain the empirical values.
And the files after produce by aMGSIM ancient-genomes
:
- Cave-22.communities.json: A JSON file with the details to generate the synthetic data of each taxon in each sample.
- Cave-22.communities.tsv: A TSV file with the details of the synthetic data of each taxon in each sample.
- Cave-22.communities_read-abundances.tsv: A TSV file with the number of reads that will be generated for each taxon in each sample
And finally the files after produce by aMGSIM ancient-reads
:
- Cave-22_read-files.json: The JSON file with the read files generated by aMGSIM
- reads: The folder containing the read files generated by metaDMG
- Cave-22_fragSim.fa.gz: The read files generated by the fragSim program
- Cave-22_deamSim.fa.gz: The read files generated by the deamSim program
- Cave-22_art.fq.gz: The final fastq files generated by ART