eukhashing project!
Directory for the Snakemake workflow for checking sourmash similarities of Thaps experimental sequences and assembled genomes
Branch 1: Raw Sequences (experiments pulled from online)
- Interleave/concatenate
- Do error trimming (coverage 5-10%) - good tool available on Conda = https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1469-3 https://anaconda.org/bioconda/afterqc
- Sourmash with k = 21, 31, 51, track abundance
Branch 2: pre-assembled
- Sourmash step
- Run on whole genome as well as just the coding region of the genome
- Just work with cleaned assemblies, not combined assembles