str-ngs-simulator is a tool to simulate Next Generation Sequencing datasets that include stutter errors characteristic of STRs.
With conda
Steps to run tool
1. Download external packages "pyfaidx" and "art"
2. Run __main__.py file with parameters "--coords", "--ref", "--art"
3. Confirm that output "test_dir" was created with output files
ART Download: https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm
Example run
General format:
python __main__.py --coords <path to coordinate file> --ref <path to reference genome> --art <path to art package>
With local parameters:
python __main__.py --coords ref_HTT.bed --ref /storage/resources/dbase/human/hg19/hg19.fa --art /storage/ashen/NGS_simulator/art_illumina --output_dir test_set_out
Output format
- The output directory contains a few different files and folders:
- "combined1.fq": combined final simulated reads in forward orientation
- "combined2.fq": combined final simulated reads in reverse orientation
- "fasta": directory containing the reference sequence files used as templates for simulated data
- "fastq": directory containing the individual simulated reads for each group of stutter errors
Please reference directory "example_files" for example input and output files
- ref_HTT.bed: example coordinates file
- test_set_out: example output directory
- Use hg19.fa reference genome
Currently str-ngs-simulator is... BUILD TEST #1 (11/21)
If you have any questions about usage or issues with the tool, please reach create a Github issue or ask a qustion in the Discussion section.
(insert contributing)