Supplementary data and information for the RAREsim R package, and the RAREsim python package.
Start to finish example simulation with example bash code for implementation.
RAREsim requires HAPGEN2, the RAREsim R package, and the RAREsim python package.
Chromosome 19 coding regions reference data that can be used with RAREsim *_input_sim_data.tar.gz The ancestry specific (AFR, EAS, NFE, SAS) input simulation haplotypes used in RAREsim Each tar ball contains haplotype and legend files for the cM blocks on Chromosome 19, as described in the RAREsim manuscript. The blocks have all sequence bases added within the coding region, to allow HAPGEN2 to simulate an abundance of variants.
Code necessary to recreate the majority of the analysis examined in the RAREsim manuscript
AFS and Nvariant target data stratified by functional and synonymous status. Target data is available for each cM block on chromosome 19, for each of the four ancesties.
Already simulated rare variant data is available at this link. For each of the four ancestral populations, 1,000 replicates of the block with the median number of bp (19,029 bp) was simulated for twice the sample size observed in gnomAD: African: N=16,256; East Asian: N=18,394; Non-Finnish European: N=113,770; South Asian: N=30,616.
The folder contains the code used to create the data.