hanchenphd/GMMAT

Error in glm.score

Opened this issue · 13 comments

hkj7 commented

Hello Dr Chen,

I am running the glmm.score command below. The command includes a gzipped bgen file and my linear mixed model regression (BSmodel). Both the bgen file and the linear mixed model contains IID. Since my bgen file is so big, I have gzipped the file and want to test if the first 100 rows are read in...The size of the gzipped bgen file is 8 GB. The IIDs aren't necessarily in the same order in the genetic file and the model. I am running the command below:

> glmm.score(BSmodel, infile = "~/Desktop/PROs_GWAS.bgen.gz", BGEN.samplefile = "~/Desktop/PROs_GWAS.sample", infile.nrow = 100, outfile = "glmm.score.bgen.testoutfile.txt")

The error message:

Error: cannot open gzipped file ~/Desktop/PROs_GWAS.bgen.gz
Warning message:
In glmm.score(BSmodel, infile = "~/Desktop/PROs_GWAS.bgen.gz", BGEN.samplefile = "~/Desktop/PROs_GWAS.sample",  :
  Argument select is unspecified... Assuming the order of individuals in infile matches unique id_include in obj...

I'm not sure what this error means. Does this mean I have to supply the select matrix? I assumed the IIDs would be automatically matched between the genetic file and regression model.

Please let me know if you would require any more information/corresponding data or commands. Thanks very much!

Thank you for your interest in GMMAT! Currently, the function does not take gzipped bgen files as the input, and you would need to gunzip it to a .bgen file.

Best,
Han

hkj7 commented

Hi Han,

Thank you for your response. I have unzipped the file but I still get an error reading in the file. I'm not sure whether its the gen.file and samplefile command that's causing it. I have put the names of the files in the command below:

> geno.file <- system.file("extdata", "PROs_GWAS.bgen", package = "GMMAT")
> samplefile <- system.file("extdata", "PROs_GWAS.sample", package = "GMMAT")

> glmm.score(BSmodel, infile = "~/Desktop/PROs_GWAS.bgen", BGEN.samplefile = "~/Desktop/PROs_GWAS.sample", infile.nrow = 100, outfile = "glmm.score.bgen.testoutfile.txt")

Error reading BGEN file: ~/Desktop/PROs_GWAS.bgen

Were you able to run any analyses using this BGEN file with a different software program (e.g. PLINK2)?

hkj7 commented

Hi Chen,

Thanks for your response. I converted the pgen file to bgen file by using the command. I also filtered the SNPs via MAF score and imputation score.

#!/bin/bash
#PBS -N Imputation 
#PBS -l walltime=06:00:00
#PBS -l nodes=1:ppn=8
#PBS -l vmem=16gb
#PBS -m bea
#PBS -M email


i=$PBS_ARRAYID
cd /data/genome/PROs_GWAS

./plink2 --threads 8 --pfile output_init_PROs --extract extract0.3.txt --maf 0.05 --export bgen-1.2 --out PROs_GWAS

I was able to use the pgen file to run other analyses but have not tried with bgen.

Please export to bgen-1.3 and let me know if it works or not.

Thanks,
Han

hkj7 commented

Hi Han,

Thank you for your response. I have exported to bgen 1.3 and still same error:

> geno.file <- system.file("extdata", "PROs_GWAS_1.3.bgen", package = "GMMAT")
> samplefile <- system.file("extdata", "PROs_GWAS_1.3.sample", package = "GMMAT")
> glmm.score(BSmodel, infile = "~/Desktop/PROs_GWAS_1.3.bgen", BGEN.samplefile = "~/Desktop/PROs_GWAS_1.3.sample", infile.nrow = 100, outfile = "glmm.score.bgen.testoutfile.txt")
Error reading BGEN file: ~/Desktop/PROs_GWAS_1.3.bgen

Can you send me a simulated reproducible example? I will take a look.

hkj7 commented

Dear Dr Chen,

Thank you for your response. I have figured out the problem as I made a stupid error and my file was not stored in the right directory. However, I am getting a new error saying:

Warning in glmm.score(BSmodel, infile = "PROs_GWAS_1.3.bgen", BGEN.samplefile = "~/Desktop/PROs_GWAS_1.3.sample",  :
  Check your data... Some id_include in obj are missing in BGEN.samplefile!
Error in glmm.score(BSmodel, infile = "PROs_GWAS_1.3.bgen", BGEN.samplefile = "~/Desktop/PROs_GWAS_1.3.sample",  : 
  Error: id_include in obj does not match sample.id in BGEN.samplefile!

My mixed linear regression model includes all patients with genotyped data and therefore exactly matches patient IDs in the bgen/sample file but of course in different orders. However, my bgen file/bgen sample file contains two IDs: FID and IID.

My sample bgen file looks like this with 1931 patient IDs. The first ID is the FID and the second ID is the IID. E.g. 1032 is FID and IID is 468768 for one individual.

ID_1 ID_2 missing sex
0 0 0 D
1032 468768 0 2
1405 468769 0 2
1564 468770 0 2
1610 468771 0 2
998 468774 0 2
975 468775 0 2
1066 468776 0 2
1038 468778 0 2

The dataframe for my linear regression model is in long format and includes patient IID and the list of covariates ...

 IID age   bmi smoking chemo bed_breast_late    etc...
470502  62 29.00       1     0           75.69       
470502  62 29.00       1     0           75.69       
470502  62 29.00       1     0           75.69     
470502  62 29.00       1     0           75.69    
470514  47 21.72       1     0           75.69      
470514  47 21.72       1     0           75.69      

All the IIDs in the dataframe above matches IIDs in the sample bgen. I thought the software would ignore the FIDs in my genetic file?

Any help you could provide would be much appreciated.

I guess your BGEN 1.3 file should already include a single identifier (see the BGEN format), so you probably don't need BGEN.samplefile?

Best,
Han

hkj7 commented

Hi Han,

I've tried running without sample file

geno.file <- system.file("extdata", "PROs_GWAS_1.3.bgen", package = "GMMAT")
glmm.score(BSmodel, infile = "PROs_GWAS_1.3.bgen", outfile = "glmm.score.bgen.testoutfile.txt")

I still get the same error:

Warning in glmm.score(BSmodel, infile = "PROs_GWAS_1.3.bgen", outfile = "glmm.score.bgen.testoutfile.txt") :
  Check your data... Some id_include in obj are missing in sample.id of infile!
Error in glmm.score(BSmodel, infile = "PROs_GWAS_1.3.bgen", outfile = "glmm.score.bgen.testoutfile.txt") : 
  Error: id_include in obj does not match sample.id in infile!

Is this because my genetic file contains 2 IDs? But my regression model only contains one ID?

Hello,

If you are not using the family ID in this analysis, could you please create a fake sample file that shows both ID_1 and ID_2 as individual ID? If your null model included the individual ID, then they should be automatically matched to the genotype file. Let me know if it fixes the problem or not.

Thanks,
Han

hkj7 commented

Hi Han,

Thanks for your response. Just to confirm, I should implement the changes to the genetic file (take out FID and repeat IID twice ) and then recreate the bgen file and sample bgen file and re-run again?

Thanks

I don't think you need to create the bgen file again. You can use BGEN.samplefile to overwrite the FID with IID (if that is what you used in your null model).