measureAggregateRsquared measures aggregate r squared from imputed gen files
measureAggregateRsquared --validation truth.gen.gz --imputed imputed.gen.gz \
--sample truth_and_imputed.samples --freq allele_frequencies_of_imputed_sites.freq \
--bin allele_frequency_bins.txt --output output_base
Make sure the truth and imputed gen files contain the same samples in the same order, which is defined in the .samples file.
This code was written by Olivier Delaneau and Warren Kretzschmar. The maintainer of this code is Warren Kretzschmar.
Please raise an issue on the github page.
The --validation
and --imputed
input files are Impute2 .gen files.
Below are a set of examples for the other input files.
comparison is by population, multiple populations allowed...
ID_1 ID_2 missing pop
0 0 0 D
NA07346 NA07346 0 EUR
NA11832 NA11832 0 EUR
Population is first line. After that, each line corresponds to the allele frequency in that population in the truth.gen file. Multiple columns, one for each population allowed.
EUR
0.2214
0.02241
0.3206
Each line defines a boundary of bins.
Whether or not the first boundary is included can be changed using the "--discard-monomorphic" flag (I think).
0.000
0.005
0.010
Each output file will be written to:
<outfix>.<POP>.snps
<outfix>.<POP>.complexs
<outfix>.<POP>.all
, where the suffix includes the type of variant being assessed as well as the population label of individuals.
Each output file is automatically tab separated and has the header:
Bin_frequency r_square num_genotypes freq_validation freq_imputation
The test suite requires an installed version of CPAN. To install the perl dependencies for the test runners:
make test-setup
To run the tests:
make test