Chipmatch

https://github.com/ikmb/chipmatch/workflows/Rust/badge.svg?branch=master

Chipmatch is a tool that identifies which chip type has been used for a given PLINK dataset. The analysis is based on Will Rayner’s Genotyping chips strand and build files.

As of 2018-11-06, his website contains 411 usable annotation files. Analyzing 650k variants against all 411 chip/build combinations usually takes less than 10 minutes.

Compiling

For compilation, you will need a working cargo/Rust installation. If you install rust through the Rustup Installer, you will already have a sufficiently recent Rust version. After that `cargo build –release` will get you a binary.

Chances are that Jan <j.kaessens@ikmb.uni-kiel.de> has a pre-compiled binary on the rzcluster.

Preparation

Download all the ZIP files from Will Rayner’s Website, or at least those you’re interested in, and dump them to a folder. Do not extract them.

Running

$ chipmatch --help
chipmatch 0.4.1
Jan Christian Kaessens <j.kaessens@ikmb.uni-kiel.de>
Tries to identify the chip type from a given BIM file and checks for orientation

USAGE:
    chipmatch [FLAGS] [OPTIONS] <FILE> <DIR>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
    -v, --verbose    Be verbose and print progress

OPTIONS:
    -e, --extract <N>      Extract the N most promising strand files to the local working directory
    -o, --output <FILE>    Write result table to FILE instead of stdout
    -t, --threads <N>      Use N threads for computation

ARGS:
    <FILE>    PLINK .bim file to guess the chip type for
    <DIR>     Directory containing Will Rayner's strand archives

The program’s output consists of three columns:

  • The annotation file name (“strand file”) as it appears in the ZIP archive
  • Name match, the ratio of SNPs where a corresponding SNP ID could be found
  • Name/Pos match, the ratio of SNPs with name vs. SNPs with matching name and position
  • Original strand rate, the ratio of SNPs with name and pos, that also have the same alleles (non-AT/CG)
  • Plus strand rate, the ratio of SNPs with name and pos, that also have the same alleles, flipping the reference to “+” (non-AT/CG)
  • AT/CG rate, the ratio of AT/CG SNPs in all SNPs with matching name and position

The table is sorted by match rate and in descending order.

Without the `–verbose` switch, only the tab-separated table is printed, with header line, making it easily parseable for script and pipeline embedding.

Example output:

$ chipmatch ../GSA-data/DKTNF.bim ../GSA
GSA-24v1-0_A6-b37.strand	0.9057662	0.9865884
GSA-24v1-0_A2-b37.strand	0.9057662	0.9865884
GSA-24v1-0_A1-b37.strand	0.9057662	0.9865884
GSA-24v2-0_A1-b37.strand	0.876778	0.92227954
GSA-24v1-0_A2-b38.strand	0.008157662	0.008886434
GSA-24v1-0_A1-b38.strand	0.008157662	0.008886434
GSA-24v1-0_A6-b38.strand	0.008157662	0.008886434
GSA-24v2-0_A1-b38.strand	0.0079119755	0.008328985
GSA-24v1-0_A2-b36.strand	0.00016569582	0.00018057342
GSA-24v1-0_A1-b36.strand	0.00016569582	0.00018057342
GSA-24v1-0_A6-b36.strand	0.00016569582	0.00018057342
GSA-24v2-0_A1-b36.strand	0.00015426852	0.00016235182

Verbose Example output

$ chipmatch -v ../GSA-data/DKTNF.bim ../GSA
Reading BIM file...
700078 variants loaded.
Scanning ../GSA/GSA-24v1-0_A2-b36-strand.zip
Scanning ../GSA/GSA-24v2-0_A1-b37-strand.zip
Scanning ../GSA/GSA-24v1-0_A6-b37-strand.zip
Scanning ../GSA/GSA-24v1-0_A1-b37-strand.zip
Scanning ../GSA/GSA-24v1-0_A2-b38-strand.zip
Scanning ../GSA/GSA-24v2-0_A1-b38-strand.zip
Scanning ../GSA/GSA-24v1-0_A6-b36-strand.zip
Scanning ../GSA/GSA-24v1-0_A1-b36-strand.zip
Scanning ../GSA/GSA-24v1-0_A6-b38-strand.zip
Scanning ../GSA/GSA-24v1-0_A1-b38-strand.zip
Scanning ../GSA/GSA-24v2-0_A1-b36-strand.zip
Scanning ../GSA/GSA-24v1-0_A2-b37-strand.zip
GSA-24v1-0_A6-b37.strand	0.9057662	0.9865884
GSA-24v1-0_A2-b37.strand	0.9057662	0.9865884
GSA-24v1-0_A1-b37.strand	0.9057662	0.9865884
GSA-24v2-0_A1-b37.strand	0.876778	0.92227954
GSA-24v1-0_A2-b38.strand	0.008157662	0.008886434
GSA-24v1-0_A1-b38.strand	0.008157662	0.008886434
GSA-24v1-0_A6-b38.strand	0.008157662	0.008886434
GSA-24v2-0_A1-b38.strand	0.0079119755	0.008328985
GSA-24v1-0_A2-b36.strand	0.00016569582	0.00018057342
GSA-24v1-0_A1-b36.strand	0.00016569582	0.00018057342
GSA-24v1-0_A6-b36.strand	0.00016569582	0.00018057342
GSA-24v2-0_A1-b36.strand	0.00015426852	0.00016235182