v1.0
Last update: 2020-10-27
ORIGAMI is a computational framework to simulate offspring genotypes using parental genetic data which uses state-of-the-art, sex-specific genetic maps to simulate recombination events in phased parental genomes. ORIGAMI takes phased parental VCF (or BCF) files as input and simulates offspring genotypes for the whole genome or a pre-specified list of SNPs. The output is PLINK file including simulated genotypes which are readily analyzable in various downstream applications.
This software is developed using linux and R. The statistical computing software R (>=3.5.1) and the following R packages are required:
- data.table (>=1.11.8)
- dplyr (>=0.8.3)
- tidyverse (>=1.2.1)
- simcross
Some tools are also needed.
We extract parent genotypes from phased vcf/bcf files. Please partition your vcf/bcf file by chromosome.
We use the snp list file you provided to extract SNPs from vcf/bcf files. The columns are CHR\tBP. Please make sure the build of rsid and BP is same with your vcf/bcf files.
Example:
1 766007
1 777232
1 901559
1 914852
https://github.com/cbherer/Bherer_etal_SexualDimorphismRecombination This genetic map(coordinates are in human genome build 37) was used in our paper.
This file should be a file with header containing SNP, CHR, BP information used to map the BP to rsid. The header must be "SNP", "CHR", "BP". GWAS summary statistics file may be a good option.
Please install all the softwares in the dependency section. Besides those, you need to download the ORIGAMI.sh and install the ORIGAMI R package.
library(devtools)
install_github("qlu-lab/ORIGAMI/ORIGAMI")
Also make sure you make the ORIGAMI.sh executable.
chmod u+x ORIGAMI.sh
After you finished file and software preparation, you can start to simulate pseudo siblings. There are three steps in the gameting. First, we need to extract the SNP you need from vcf/bcf files. Second, we extract parental genotype and simulate pseudo siblings. Finally, we combine the pseudo sibling genotype and convert them into ped and bfiles.
- -s: step. Must be included. There are five options you may specify: "gamete", "extract", "combine", "help", "transmission".
- -l: The file name of your SNP list file.
- -p: father ID.
- -m: mother ID.
- -f: The file name of bcf/vcf files. CHR specific. Use # to replace the number of CHR in the file name.
- -n: number of sibling you want to gamete.
- -c: cohort name. ORIGAMI uses cohort name to create new directory.
- -b: The path of bcftools. This should be
bcftools_path
/bcftools. It should not include "/" at the end. - -e: The file path of genetic map files of male. CHR specific. Use # to replace the number of CHR in the file name.
- -t: The file path of genetic map files of male. CHR specific. Use # to replace the number of CHR in the file name.
- -h: R library path. Optional.
- -i: In step "gamete", i is family ID for one family. In step "combine", i is family ID list.
- -r: File name of reference map file.
- -k: plink software path.
- -o: output file name. The fincal bfiles names you want to specify. Must be included.
- -d: Directory you want to make the new directory containing all the files generated by ORIGAMI.
./ORIGAMI.sh -c <Your trait name> -s extract \
-f <Your bcf or vcf file name> \
-b <Your bcf tool path> \
-l <SNP list file name> \
-h <R library (optional)> \
-e <male genetic map file>\
-t <female genetic map file> \
-r <Your reference file>
./ORIGAMI.sh -c ASD -s gamete \
-d <Directory contains your 'cohort' folder (optional)>
-b <Your bcf tool path> \
-l <SNP list file name> \
-p <father ID> -m <mother ID> \
-h <R library (optional)> \
-e <male genetic map file> \
-t <female genetic map file> \
-i <family ID> -n <number of pseudo siblings>
If you want to gamete pseudo siblings for many couples, please do parallel in this step to save time.
./ORIGAMI.sh -c ASD -s combine \
-d <Directory contains your 'cohort' folder (optional)>
-b <Your bcf tool path> \
-h <R library (optional)> \
-i <family list file>\
-r <Your reference file> \
-k <your plink path> \
-o <your bfile name>
A brief example is in example/test.sh. Please download the whole example folder and fill the bcftools,R,plink path.
Chen J., You J., Zhao Z., Ni Z, Huang K., Wu Y., Fletcher J., Lu Q. (2020). Gamete simulation improves polygenic transmission disequilibrium analysis. bioRxiv doi:10.1101/2020.10.26.355602