Long Format for SNP_ID

Question

Long Format for SNP_ID

jabbarlinks opened this issue 4 years ago · 12 comments

Respected Dr. Frei,

My bim file contains long format for SNP ID like 22:16541711_T_G , similarly in my GWAS file. But when I run fit1 then it gives segmentation error message. I check everything so many times but not successful.

However, I repeated all these code successfully for univariate & bivariate analysis & visualization etc by taking some GWAS from internet. The only difference I found is these files have short format of SNP ID is like rs1755411.

So, my question, does long variant name works in MiXer or not? Or I have to change my bim fam bed files in short variant name like rs15425421

Kind Regards

Answer 1 · 2021-06-29T06:22:43.000Z

Hi One option is as you said replace chr:bp_a1_a2 with rs#. Another solution is to split into 4 columns named CHR, BP, A1,A2 - mixer will also recognize this if I t can't match based on SNP column. If you use this make sure that BP are hg19. Kind regards Alex

…

On Tue, Jun 29, 2021, 04:23 jabbarlinks ***@***.***> wrote: Respected Dr. Frei, My bim file contains long format for SNP ID like *22:16541711_T_G* , similarly in my GWAS file. But when I run *fit1* then it gives *segmentation error* message. I check everything so many times but not successful. However, I repeated all these code successfully for univariate & bivariate analysis & visualization etc by taking some GWAS from internet. The only difference I found is these files have short format of SNP ID is like *rs1755411*. So, my question, does long variant name works in MiXer or not? Or I have to change my bim fam bed files in short variant name like *rs15425421* Kind Regards — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#48>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADEPV6KNE4LERWLBACS3A4LTVEVCVANCNFSM47PEIWVA> .

Answer 2 · 2021-06-29T11:57:34.000Z

Respected Dr. Frei,

Thanks for you quick reply and sorry for repeating it.

So, it means MiXeR does not recognized SNP column if it is CHR:bp_a1_a2 format ?

If its true then my focused is to convert bfiles (bim, fam, bed) in rs# format by separating all columns (I did not know this, I have to learn it).

However, converting only sample data file in rs# does not work as bfiles have long variant CHR:bp_a1_a2 format which obviously not the same as in sample data.

Need you kind advise in this regard.

Regards

Answer 3 · 2021-06-29T12:09:38.000Z

MiXeR does not recognized SNP column if it is CHR:bp_a1_a2 format ?

This is correct. MiXeR doesn't recognize this format for SNP column.

convert bfiles (bim, fam, bed) in rs#

No, you don't need to do this. My suggestion is to convert the summary statistics (i.e. the files you feed into --trait1 and --trait2).

Answer 4 · 2021-06-29T12:55:54.000Z

Respected Dr. Frei,

Thanks alot, but converting only sample GWAS does not work and same segmentation error which means the files I am looking for are not available (SNP does not match). However, there is not problem to access the directories.

Many thanks for your kind cooperation, I will further re-look in details and try after converting bfiles and update you if it works.

Regards

Answer 5 · 2021-07-04T18:48:23.000Z

Dear Dr. Frei,
Might be I am wrong, so want to update you and need your views as still problem exists.

Now my sample GWAS file (which is taken from .bim by simulation methods, so every chr, bp, a1, a2 in sample GWAS must exists in .bim file too). Then, added corresponding rs# to my sample file taken from **Kaviar - known variants in the human genome, release 160204 (systemsbiology.net) ** as hg19 GRCh37.

The sample GWAS file looks like
SNP CHR bp PVAL A1 A2 N BETA SE
rs139711966 22 16904583 0.580567 T C 2000 -0.03325 0.060159
rs62228723 22 17065548 0.19045 A A 2000 0.086706 0.066203
rs5748002 22 17075476 0.571352 A A 2000 0.030454 0.053791
rs5993571 22 17089569 0.185429 G A 2000 0.04896 0.03696

.bim files looks like
22 22:16904056_A_G 0 16904056 A G
22 22:16904583_C_T 0 16904583 C T
22 22:16905630_G_A 0 16905630 G A
22 22:16906282_G_A 0 16906282 G A
22 22:16906517_C_A 0 16906517 C A

But still the error was same; segmentation error. Now only 1 solution in my mind is revise / regenerate .bim .bed .fam file by adding rs#.

Kind Regards

Answer 6 · 2021-07-06T06:55:27.000Z

You GWAS file look (almost) compatible with MiXeR. I think it lacks "z-score", but apart from that it should work.

But still the error was same; segmentation error.

Please post here the specific error you get.

Answer 7 · 2021-07-07T04:30:01.000Z

Dear Dr. Frei,

Many thanks, I am pasting here error and log file also. I also have z-score, same error when i use GWAS with z-score. Hhowever, I will post error in next seprate post to avoid too long.

Error message:

Call:
./mixer.py fit1
--out /home/jabbar/Plink/retest/train_fit.rep.22
--lib /home/jabbar/mixer/src/build/lib/libbgmg.so
--bim-file /home/jabbar/Plink/retest/EURchr22bedformat.bim
--ld-file /home/jabbar/Plink/retest/EURchr22bedformat.run4.ld
--trait1-file /home/jabbar/Plink/retest/train11_qc_noMHC.csv.gz
--extract /home/jabbar/Plink/retest/EURchr22bedformat.QC.snps
)
INFO:root:init(lib_name=/home/jabbar/mixer/src/build/lib/libbgmg.so, context_id=0)
Segmentation fault (core dumped)

log file:

20210706 23:54:47.431707 20210706 23:54:47.431827 20210706 23:54:47.431947 20210706 23:54:47.432066 20210706 23:54:47.432186 20210706 23:54:47.432306 20210706 23:54:47.432430 20210706 23:54:47.432550 20210706 23:54:47.432670 20210706 23:54:47.432790 20210706 23:54:47.432943 20210706 23:54:47.434795 20210706 23:54:47.434994 20210706 23:54:47.456043 20210706 23:54:47.498489 20210706 23:54:47.499449 20210706 23:54:47.604790 20210706 23:54:47.605204 20210706 23:54:47.605502 20210706 23:54:47.605796 20210706 23:54:47.606061 20210706 23:54:47.606325 20210706 23:54:47.606758 20210706 23:54:47.643059 20210706 23:54:47.643949 20210706 23:54:47.644291 20210706 23:54:47.644543 20210706 23:54:47.644786 20210706 23:54:47.645091 20210706 23:54:47.645443 20210706 23:54:47.645613 20210706 23:54:47.645891 20210706 23:54:47.646164 20210706 23:54:47.647013 20210706 23:54:47.647580 20210706 23:54:47.647928 20210706 23:54:47.648116 20210706 23:54:47.648510 20210706 23:54:47.648857 20210706 23:54:47.649205 20210706 23:54:47.649388 20210706 23:54:47.687400 20210706 23:54:47.687798 20210706 23:54:47.694007 20210706 23:54:47.694913 =* GNU General Public License v3
=***********************************************************************
=Call:
=./mixer.py fit1
= --out /home/jabbar/Plink/retest/train_fit.rep.22
= --lib /home/jabbar/mixer/src/build/lib/libbgmg.so
= --bim-file /home/jabbar/Plink/retest/EURchr22bedformat.bim
= --ld-file /home/jabbar/Plink/retest/EURchr22bedformat.run4.ld
= --trait1-file /home/jabbar/Plink/retest/train11_qc_noMHC.csv.gz
= --extract /home/jabbar/Plink/retest/EURchr22bedformat.QC.snps
Dispose context (id=0)
Create new context (id=0)
>init(bim_file=/home/jabbar/Plink/retest/EURchr22bedformat.bim, frq_file=, chr_labels=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22, trait1_file=/home/jabbar/Plink/retest/train11_qc_noMHC.csv.gz, trait2_file=, exclude=, extract=/home/jabbar/Plink/retest/EURchr22bedformat.QC.snps);
Construct reference from 1 files...
Found 12316 variants in /home/jabbar/Plink/retest/EURchr22bedformat.bim
Found 12316 variants in total.
Found 409 variants with well-defined Z and N in /home/jabbar/Plink/retest/train11_qc_noMHC.csv.gz. Other statistics:
608 lines found (including header)
466 lines matched via CHR:BP:A1:A2 code (not SNP rs#)
141 lines were ignored as RS# (or chr:bp:a1:a2 code) did not match reference file.
57 variants were ignored as they are strand-ambiguous.
409 variants had flipped A1/A2 alleles; sign of z-score was flipped.
constrain analysis to 409 tag variants (due to trait1_file='/home/jabbar/Plink/retest/train11_qc_noMHC.csv.gz')
constrain analysis to 348 tag variants (due to extract='/home/jabbar/Plink/retest/EURchr22bedformat.QC.snps')
set_tag_indices(num_snp=12316, num_tag=348);
>set_chrnumvec(12316);
<set_chrnumvec(12316);
>LdMatrixCsr::init_chunks();
highest chr label: 22
<LdMatrixCsr::init_chunks();
set_zvec(trait=1); num_undef=0
set_nvec(trait=1); num_undef=0
<init(bim_file=/home/jabbar/Plink/retest/EURchr22bedformat.bim, frq_file=, chr_labels=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22, trait1_file=/home/jabbar/Plink/retest/train11_qc_noMHC.csv.gz, trait2_file=, exclude=, extract=/home/jabbar/Plink/retest/EURchr22bedformat.QC.snps); elapsed time 211ms
set_option(r2min=0);
set_option(kmax=20000);
clear_state
set_option(seed=123);
set_option(cubature_rel_error=1e-05);
set_option(cubature_max_evals=1000);
initialize mafvec
>load_ld_matrix(filename=/home/jabbar/Plink/retest/EURchr22bedformat.run4.ld)
<load_ld_matrix(filename=/home/jabbar/Plink/retest/EURchr22bedformat.run4.ld), format version 1
set_ld_r2_coo(filename=/home/jabbar/Plink/retest/EURchr22bedformat.run4.ld, numel=143202)...
>set_ld_r2_coo(chr_label=1, length=143202);
LdMatrixCsr::init_diagonal(chr_label=1) added 0 tag (out of 0 snps) elements with r2=1.0 to the diagonal of LD r2 matrix

Answer 8 · 2021-07-07T07:17:29.000Z

Dear Dr. Frei,

Just for reference, same error like above for GWAS with Z score.

Sample Gwas is as

CHR SNP POS ID A1 A2 N Z P BETA SE
22 rs139711966 16904583 22:16904583_C_T C T 3000 -1.64042 0.101024 -0.083106 0.0506615
22 rs2890567 17065545 22:17065545_A_G A A 3000 -0.818327 0.413235 -0.0273337 0.033402
22 rs62228723 17065548 22:17065548_A_T A A 3000 1.16552 0.243903 0.0674969 0.0579116
22 rs5993571 17089569 22:17089569_A_G A G 3000 -1.27401 0.202757 -0.0400473 0.0314339
22 rs5993992 17225773 22:17225773_T_C T C 3000 1.77873 0.0753845 0.074904 0.0421108
22 rs165652 17315458 22:17315458_T_G T T 3000 -0.433815 0.664454 -0.0132143 0.0304607
22 rs62235933 17363636 22:17363636_G_T G T 3000 1.11042 0.266907 0.0475201 0.0427947

Answer 9 · 2021-08-02T07:14:47.000Z

@jabbarlinks hi, and sorry for delay - I've been away
Please try adding "--chr2use 22" flag to your mixer.py command. By default --chr2use is 1-22, which doesn't work in your example as the data is limited to chr22.

From this message:

20210706 23:54:47.694007 >set_ld_r2_coo(chr_label=1, length=143202);
20210706 23:54:47.694913 LdMatrixCsr::init_diagonal(chr_label=1) added 0 tag (out of 0 snps) elements with r2=1.0 to the diagonal of LD r2 matrix

it seem that MiXeR doesn't correctly handle "empty" chromosomes. I agree it shouldn't crash, but give a meaningful error - but for now please try to constrain the analysis to chr22.

I have an example which works on chr21 here: https://github.com/comorment/containers/blob/main/usecases/mixer_simu.md

Answer 10 · 2021-08-02T19:25:57.000Z

Respected Dr. Frei,

Many thanks for your valuable feedback, Yes, it work 100% now.

Many many thanks

Answer 11 · 2021-08-02T19:28:07.000Z

@jabbarlinks Excellent, thank you for letting me know! 👍

Answer 12 · 2023-03-28T14:25:04.000Z

Dear Dr Frei @ofrei,

I was having a similar issue to this - I was getting a segmentation fault when I tried to submit mixer.py ld to generate .ld and .snps files. I was doing this on just chr 22 to make sure it works before I submit jobs for all of the chromoromes. I assumed that perhaps the error was similar, so I added --chr2use 22 into the script like this:
python ./mixer/precimed/mixer.py ld --lib ./mixer/src/build/lib/libbgmg.so --bfile 1000G_EUR_Phase3_plink/1000G.EUR.QC.22 --out 1000G_EUR_phase3_plink/1000G.EUR.QC.22.run4.ld --chr2use 22 --r2min 0.05 --ldscore-r2min 0.05 --ld-window-kb 30000
But I got the error:
usage: mixer.py [-h] {fit1,test1,fit2,test2,ld,perf,snps} ... mixer.py: error: unrecognized arguments: --chr2use 22
Have I implemented this fix incorrectly?

The readme mentions that this step is not necessary for EUR data, which is what I'm using, however I couldn't work out how to do the univariate step without the .ld and .snps files that it needs.

Any help you could offer with this would be very much appreciated!

Many thanks,
Anja