# atlas-var A document based Variant database. Primarily used as a tool box for builing sequence probes for [Mykrobe preictor](https://github.com/phelimb/Mykrobe-predictor), [atlas](https://github.com/phelimb/atlas) genotype and [atlas-seq](https://github.com/phelimb/atlas-seq). However, it can also be used to build a sparse variant database with [mongodb](https://www.mongodb.com) a document based NoSQL database. https://github.com/ga4gh/schemas # Usage ## Building a probe set for typing with `atlas-seq` or `atlas genotype` You can create a variant probe set in several ways. ## 1 Simple case - building a probe without using backgrounds atlas-var make-probes -v A1234T example-data/NC_000962.3.fasta ## 2. 'Dumping' the Variant database `atlas dump-probes` usage: atlas dump-probes [-h] [--db_name db_name] [-q] [--kmer kmer] [--force] [-v] reference_filepath positional arguments: reference_filepath reference_filepath optional arguments: -h, --help show this help message and exit --db_name db_name db_name -q, --quiet do not output warnings to stderr --kmer kmer kmer length --force -v, --verbose atlas dump-probes reference_set.fasta > variant_probe_set.fasta This will generate a probe set for each variant in the database. The resulting fasta file will look like the following: >ref-37d2eea6a23d526cbee4e00b901dc97885a88e7aa8721432b080dcc342b459ce?num_alts=10&ref=56cf2e4ca9fefcd2b15de4d6 TCGCCGCAGCGGTTGGCAACGATGTGGTGCGATCGCTAAAGATCACCGGGCCGGCGGCACCAT ... TCGCCGCAGCGGTTGGCAACGATGTGGTGCAATCGCTAAAGATCACCGGGCCGGCGGCATCAT >alt-37d2eea6a23d526cbee4e00b901dc97885a88e7aa8721432b080dcc342b459ce TCGCCGCAGCGGTTGGCAACGATGTGGTGCAATCGCTAAAGATCACCGGGCCGGCGGCACGAT >ref-2dab6387a677ac17f6bc181f47235a4196885723b34ceff3a05ffcbfd6834347?num_alts=10&ref=56cf2e4ca9fefcd2b15de4d6 CTGTCGCTGGGAAGAGCGAATACGTCTGGACCAGGACGGGCTACCCGAACACGATATCTTTCG >alt-2dab6387a677ac17f6bc181f47235a4196885723b34ceff3a05ffcbfd6834347 ... Where you have a series of variants represented as a set of alleles. The reference allele followed by multiple alternate alleles. You will end up with multiple alternate alleles if there are other variants that fall within k of the target variant. Each variant is referenced by a `var_hash` with is the hash of ":ref:pos:alt" which is indexed in the database and can be used to query for Variant object. See `atlas genotype` to use these probes to genotype a new sample. ## 3. Building a custom probe set `atlas make-probes` allows you to build a probe set using Variants that are not already in the database. usage: atlas make-probes [-h] [--db_name db_name] [-q] [-v VARIANT] [-f FILE] [-g GENBANK] [-k KMER] [--no-backgrounds] reference_filepath positional arguments: reference_filepath reference_filepath optional arguments: -h, --help show this help message and exit --db_name db_name db_name -q, --quiet do not output warnings to stderr -v VARIANT, --variant VARIANT Variant in DNA positions e.g. A1234T -f FILE, --file FILE File containing variants as rows A1234T -g GENBANK, --genbank GENBANK Genbank file containing genes as features -k KMER, --kmer KMER kmer length --no-backgrounds Build probe set against reference only ignoring nearby variants Example usages: ## Build a variant probe set defined based on reference co-ordinates (1-based) First, define your variants for which you want to build probes. Columns are ref/gene pos ref alt alphabet ref 2522798 G T DNA ref 3785555 A G DNA ref 839793 C A DNA ref 2734398 C G DNA ref 3230861 T A DNA ref 1018694 A T DNA atlas make-probes --db_name :db_name -f variants.txt ref.fa > variant_probe_set.fa ## Build a variant probe set defined based on gene co-ordinates (1-based) You can also define your variants in terms of gene coordinates in amino acid or DNA space. rpoB S431X PROT rpoB F425X PROT embB M306X PROT rrs C513X DNA gyrA D94X PROT gid P75L PROT gid V88A PROT katG S315X PROT To do this you must provide a genbank file defining the position of the variants in the reference (-g (GENBANK) ) atlas make-probes --db_name :db_name -f aa_variants.txt -g ref.gb ref.fa> gene_variant_probe_set.fa