Chipmatch
Chipmatch is a tool that identifies which chip type has been used for a given PLINK dataset. The analysis is based on Will Rayner’s Genotyping chips strand and build files.
As of 2018-11-06, his website contains 411 usable annotation files. Analyzing 650k variants against all 411 chip/build combinations usually takes less than 10 minutes.
Compiling
For compilation, you will need a working cargo/Rust installation. If you install rust through the Rustup Installer, you will already have a sufficiently recent Rust version. After that `cargo build –release` will get you a binary.
Chances are that Jan <j.kaessens@ikmb.uni-kiel.de> has a pre-compiled binary on the rzcluster.
Preparation
Download all the ZIP files from Will Rayner’s Website, or at least those you’re interested in, and dump them to a folder. Do not extract them.
Running
$ chipmatch --help chipmatch 0.4.1 Jan Christian Kaessens <j.kaessens@ikmb.uni-kiel.de> Tries to identify the chip type from a given BIM file and checks for orientation USAGE: chipmatch [FLAGS] [OPTIONS] <FILE> <DIR> FLAGS: -h, --help Prints help information -V, --version Prints version information -v, --verbose Be verbose and print progress OPTIONS: -e, --extract <N> Extract the N most promising strand files to the local working directory -o, --output <FILE> Write result table to FILE instead of stdout -t, --threads <N> Use N threads for computation ARGS: <FILE> PLINK .bim file to guess the chip type for <DIR> Directory containing Will Rayner's strand archives
The program’s output consists of three columns:
- The annotation file name (“strand file”) as it appears in the ZIP archive
- Name match, the ratio of SNPs where a corresponding SNP ID could be found
- Name/Pos match, the ratio of SNPs with name vs. SNPs with matching name and position
- Original strand rate, the ratio of SNPs with name and pos, that also have the same alleles (non-AT/CG)
- Plus strand rate, the ratio of SNPs with name and pos, that also have the same alleles, flipping the reference to “+” (non-AT/CG)
- AT/CG rate, the ratio of AT/CG SNPs in all SNPs with matching name and position
The table is sorted by match rate and in descending order.
Without the `–verbose` switch, only the tab-separated table is printed, with header line, making it easily parseable for script and pipeline embedding.
Example output:
$ chipmatch ../GSA-data/DKTNF.bim ../GSA GSA-24v1-0_A6-b37.strand 0.9057662 0.9865884 GSA-24v1-0_A2-b37.strand 0.9057662 0.9865884 GSA-24v1-0_A1-b37.strand 0.9057662 0.9865884 GSA-24v2-0_A1-b37.strand 0.876778 0.92227954 GSA-24v1-0_A2-b38.strand 0.008157662 0.008886434 GSA-24v1-0_A1-b38.strand 0.008157662 0.008886434 GSA-24v1-0_A6-b38.strand 0.008157662 0.008886434 GSA-24v2-0_A1-b38.strand 0.0079119755 0.008328985 GSA-24v1-0_A2-b36.strand 0.00016569582 0.00018057342 GSA-24v1-0_A1-b36.strand 0.00016569582 0.00018057342 GSA-24v1-0_A6-b36.strand 0.00016569582 0.00018057342 GSA-24v2-0_A1-b36.strand 0.00015426852 0.00016235182
Verbose Example output
$ chipmatch -v ../GSA-data/DKTNF.bim ../GSA Reading BIM file... 700078 variants loaded. Scanning ../GSA/GSA-24v1-0_A2-b36-strand.zip Scanning ../GSA/GSA-24v2-0_A1-b37-strand.zip Scanning ../GSA/GSA-24v1-0_A6-b37-strand.zip Scanning ../GSA/GSA-24v1-0_A1-b37-strand.zip Scanning ../GSA/GSA-24v1-0_A2-b38-strand.zip Scanning ../GSA/GSA-24v2-0_A1-b38-strand.zip Scanning ../GSA/GSA-24v1-0_A6-b36-strand.zip Scanning ../GSA/GSA-24v1-0_A1-b36-strand.zip Scanning ../GSA/GSA-24v1-0_A6-b38-strand.zip Scanning ../GSA/GSA-24v1-0_A1-b38-strand.zip Scanning ../GSA/GSA-24v2-0_A1-b36-strand.zip Scanning ../GSA/GSA-24v1-0_A2-b37-strand.zip GSA-24v1-0_A6-b37.strand 0.9057662 0.9865884 GSA-24v1-0_A2-b37.strand 0.9057662 0.9865884 GSA-24v1-0_A1-b37.strand 0.9057662 0.9865884 GSA-24v2-0_A1-b37.strand 0.876778 0.92227954 GSA-24v1-0_A2-b38.strand 0.008157662 0.008886434 GSA-24v1-0_A1-b38.strand 0.008157662 0.008886434 GSA-24v1-0_A6-b38.strand 0.008157662 0.008886434 GSA-24v2-0_A1-b38.strand 0.0079119755 0.008328985 GSA-24v1-0_A2-b36.strand 0.00016569582 0.00018057342 GSA-24v1-0_A1-b36.strand 0.00016569582 0.00018057342 GSA-24v1-0_A6-b36.strand 0.00016569582 0.00018057342 GSA-24v2-0_A1-b36.strand 0.00015426852 0.00016235182