Given a set of genome fasta files, generate primer candidates (for qPCR) that is specific to each genome.
- Focus on coding sequences - first annotate each genome to extract protein coding sequences.
- Concatenate every genome fasta file to make an “excluded” genome. For every genome G, primers specific to G should not bind to G’, the excluded genome.
- Index each genome and its cognate excluded genome using bowtie2.
- Propose primer candidates with parameters suitable for qPCR using primer3. Format the forward and reverse primers as “reads”
- Align these primer “read” files to both the genome and the excluded genome.
- Filter out primer pairs with non specific binding. I am using bowtie2 with a default alignment range of [0,5000]. This is meant to filter out short confordant alignments.
- Setup conda with python 3.11 and snakemake 8.11.3. My initial choice of using python3.11 is affecting a lot of downstream decisions, because most conda packages still require python<3.11.
- Installed the snakemake executor plugin for slurm to work
- I am using two custom conda environments.
bakta
has weird dependency issues with python 3.11bakta
also has weird issues currently with its reliance on amrfinderplus. I have had to comment out the parts of the bakta code which handle this.
bowtie2
was easier to install in a separate environment for some reason.
- I am using
bakta
because I have not been able to installprokka
with my current setup. - Finally, this pipeline is setup to use
slurm
for parallelizing tasks. - Applied this patch to get a local env woring https://github.com/snakemake/snakemake/compare/main…ShogoAkiyama:snakemake:conda-url-env-bug
bakta
performance is affected by a lot of network IO (oschwengers/bakta#282). Typical runs I’ve observed are ~30 minutes.
- Create a directory, say RUNNAME.
- Modify
config.yaml
to pointrun_path
to RUNNAME. - In RUNNAME/genome/ copy the individual genome FASTA files that are the targets for primer generation.
- If using
slurm
modify the suppliedsbatch
file to reflect the partition parameters, and run usingsbatch run_primergen.sbath
- If running from the commandline, use something like
snakemake --snakefile primergen.snakefile -p\ --rerun-incomplete\ --cores 4\ --use-conda\ --configfile config.yaml\ -j 100\ --keep-incomplete --latency-wait 60