The code for CONFIT has been split into 4 parts:
-
Simulate or read in GWAS summary statistics, and compute CONFIT test statistic for each SNP.
-
Get the null distribution of CONFIT test statistic. This should be run in parallel, since many null test statistics are needed to get the distribution.
-
Get top null CONFIT test statistics, which are used to help compute p-values.
-
Compute CONFIT p-values for each SNP.
We describe basic use of CONFIT and show sample commands that can be run on the provided example data sets.
- requires Python 2.7
- requires numpy and scipy
In this step, we read in or simulate GWAS summary statistics, and compute CONFIT test statistic for each SNP from the GWAS data.
General options
exptName
- name for output files (without extension)
outdir
- output directory
Required options for GWAS data from file
nSnp
- number of SNPs in analysisgwasFormat
- specify format of GWAS summary statisticstraits
andtraitPath
- names of each summary statistic file and their directory
Example for pylmm-formatted data
python src/confit1_data.py --exptName NFBCexample --outdir exampleOutput --nSnp 100 --gwasFormat pylmm --traits NFBC_hdlres.pylmm.out NFBC_ldlres.pylmm.out NFBC_tgres.pylmm.out --traitPath your/path/to/data/CONFIT_examples/gwasExampleData
Example for UKBB data in Neale group format. (Note this sample data has already been sorted so that SNPs are in same order for each trait.)
python src/confit1_data.py --exptName UKBBexample --outdir exampleOutput --nSnp 100 --gwasFormat UKBB_sorted --traits UKBB_6177_1.assoc.tsv.sorted UKBB_6177_2.assoc.tsv.sorted UKBB_6177_3.assoc.tsv.sorted --traitPath your/path/to/data/CONFIT_examples/gwasExampleData
Required options for simulated data
useSimulatedData
- Set as 1 use simulated datanSnp
- how many SNPs to simulatesigmasq_mu_sim
- variance of simulated z-scores
Example for simulated data (100 SNPs, 3 traits)
python src/confit1_data.py --exptName simulatedExample --outdir exampleOutput --useSimulatedData 1 --nSnp 100 --nTraits 3 --sigmasq_mu_sim 25
Example for simulated data (1000 SNPs, 3 traits, with some additional simulation options shown)
python src/confit1_data.py --exptName simulatedExampleWithOptions --outdir exampleOutput --useSimulatedData 1 --nSnp 1000 --nTraits 3 --sigmasq_mu_sim 25 --t1trueSigmasq 25 --Sigma_e_file_sim your/path/to/data/CONFIT_examples/simulationExampleData/threeTraitExample_Sigma_z.txt --truePriorFile your/path/to/data/CONFIT_examples/simulationExampleData/threeTraitExample_truePriorFile.txt
This step get draws from the null distribution of the CONFIT test statistic. The null test statistics are then used to obtain p-values in Steps 3 and 4.
Each time confit2_nullsim4pval is run, it will generate null test statistics in a separate file (so this step is easily parallelized in order to generate many null test statistics).
Required options
exptName
- name for output files, as in Step 1outdir
- as in Step 1nulldir
- where to output the null test statistics (which are just intermediate output)nTraits
ORtraits
- either specify how many traits or list them out as in Step 1taskID
- used to give each null test statistic file a unique name. Specify a number for each run of confit2_nullsim4pval.py (i.e. --taskID 2 for the second run).
Other options
-
nNullPerRound
(default$2*10^6$ ) -
nNullRoundsPerJob
(default$25$ ) - Each time the command is run, it will generate nNullPerRound*nNullRoundsPerJob null test statistics. If running on a system with limited memory, you can decreasenNullPerRound
so fewer test statistics are generated at a time.
Example
- Note: we'll use the UKBB example data for the rest of the steps
- The below commands run the null simulation step twice, sequentially. This will create two files, each with 4000*25 null test statistics.
- In practice, you probably want to run many of these in parallel, in order to obtain p-values with more precision.
python src/confit2_nullsim4pval.py --exptName UKBBexample --outdir exampleOutput --nulldir exampleNullsim --nTraits 3 --nNullPerRound 4000 --nNullRoundsPerJob 25 --taskID 1
python src/confit2_nullsim4pval.py --exptName UKBBexample --outdir exampleOutput --nulldir exampleNullsim --nTraits 3 --nNullPerRound 4000 --nNullRoundsPerJob 25 --taskID 2
This is an intermediate step before the p-values are computed in Step 4. It helps compute high-resolution p-values for the most significant test statistics. The default settings assume
Options
-
exptName
,outdir
,nulldir
- as in Step 1 and 2 -
nNullFiles
- how many files were generated in Step 2 -
nNullTotal
(default$5 * 10^9$ ) - how many null test statistics were generated in Step 2 across all files -
nTopNull
(default$1000$ ) - how many of top null test statistics to get
Example
- with parameters set to match what we did in Step 2.
python src/confit3_getTopStatistics.py --exptName UKBBexample --nulldir exampleNullsim --outdir exampleOutput --nNullFiles 2 --nNullTotal 20000
Compute CONFIT p-values. The default settings assume
Required options
exptName
,outdir
,nulldir
- as in Step 1 and 2nTraits
ORtraits
- as in Step 1 and 2useSimulatedData
- 1 if data was simulated by CONFIT, 0 if read from file (used because the output format will have some additional columns if the data was simulated)
By default, CONFIT computes less-significant p-values with lower precision. These additional options may be used to adjust the p-value resolution.
-
nNullFullRes
(default$5 * 10^9$ ) - how many null test statistics to use for highest resolution. You can set it tonNullTotal
as in Step 3. -
nNullLowRes
(default$10^8$ ) - upper bound on how many null test statistics to use for lower resolution p-values. (For the toy example below, we just set it to a small value.)
Example
python src/confit4_consolidatePvals.py --exptName UKBBexample --outdir exampleOutput --nulldir exampleNullsim --useSimulatedData 0 --nTraits 3 --nNullLowRes 5000 --nNullFullRes 20000