statgen/Minimac4

How to obtain the RS ID from Minimac4 results

Opened this issue · 17 comments

Hi,

How to obtain the RS ID from Minimac4 results? Thanks a lot!

Best,

Bo

You need to add --rsid when running minimac4. The reference panel also needs to have RS IDs in the ID column.

Hi Jonathon,
I added --rsid ON when running minimac4 and used the reference panel from your download website. But I can't find the RS IDs from M3VCF file. Could you give me some advice? Thanks a lot!

I'm assuming you are referring to the 1000 genomes panel. This panel does not have RS IDs. You should be able to get them from the 1000 genomes VCFs on the 1000 genomes FTP site.

I used the Minimac3 to Convert 1000 genomes panel VCF to M3VCF, I can not find the RS IDs, only obtain Chr:pos as SNP

If the RS IDs exist in the VCF but not the M3VCF, then I would suggest using https://github.com/Santy-8128/m3vcftools to compress to M3VCF. This tool will copy over the ID column. I don't know whether the VCFs on our site include RS IDs, but the VCFs on the 1000 genomes site do.

Hi,
I used the --referenceEstimates OFF, but it still work for ON

Hi all!
First of all, thanks for publishing your code on github!

We have a similar problem with missing IDs in the imputation output. We use minimac3 v.2.0.1 with the --rsid option to convert our panel with custom IDs from vcf to m3vcf. The resulting file still contains the IDs. We then convert the m3vcf file to msav format with minimac4 --update-m3vcf and run the imputation, but the output is missing the IDs.

We checked the msav file with the sav export command from savvy and there where no IDs in it. Any idea why the IDs get lost when converting from m3vcf to msav? We tried passing the --rsid option to minimac4, but it had no effect (and it is marked as deprecated). If I understood the previous discussion correctly, the IDs should be passed on.

@steffenom, thanks for reporting this. The earlier conversation was regarding v4.0.x. You are using v4.1.x, and this feature was missing from the new version. I just pushed a fix to the master branch. Please try the latest from the master branch to generate a new msav file.

Hi @jonathonl,
I tried the newest version on the master branch and it worked! The IDs showed up as expected. Thanks for the quick fix!

Minor drawback is that now all variants have an ID. The ones that don't have an ID in the reference panel now have an ID given by CHR:POS, but that is not a problem for us. Might be unexpected for other users.

For the IDs with CHR:POS, are these variants that exist only in the target file (not in reference)? If using --all-typed-sites, IDs for such variants are carried over from the target VCF instead of the reference panel. If the variant exists in the reference panel and has a missing ID in the reference panel, then the ID for that variant should also be missing in the imputed results.

No, for me all variants without an ID in the initial reference panel have the CHR:POS ID in the final output (without using --all-typed-sites).
I think, these IDs are already create when creating the m3vcf-file from the reference panel with minimac3 and then minimac4 --update-m3vcf just takes them over.

I see. FYI, you can generate an msav directly from a VCF, BCF, or SAV file with minimac4 --compress-reference input.vcf.gz -o compressed_output.msav. This still needs to be documented in the --help and README.

Thanks for the hint! I tried minimac4 --compress-reference and now the IDs are as expected.

Should the results of the imputation with a reference panel created with minimac4 --compress-reference be similar to results for the same panel created with minimac3 --processReference + minimac4 --update-m3vcf? Or is one preferred over the other in certain situations?

There may be a small difference with smaller reference panels (tens of thousands of samples). By default, minimac3 --processReference does parameter estimation and saves those parameters in the m3vcf. This parameter estimation will be less useful for larger panels.

minimac4 --compress-reference input.vcf.gz -o compressed_output.msav somehow kicked me out with:

minimac v4.1.0

Error: Cannot write empty block
Error: serializing final block failed

input.vcf.gz has 1052764 chromosome 20 variants (rows).
The file has 915 columns, converting to m3vcf with minimac3 works.

The code line where I'm kicked out is:

return std::cerr << "Error: Cannot write empty block\n", false;

It also failed for 4, 14, 15 - all other chromosomes works.

Error: Cannot write empty block
Error: serializing final block failed

@buegelbeatz , this should be fixed with 6f9f140

Error: Cannot write empty block
Error: serializing final block failed

@buegelbeatz , this should be fixed with 6f9f140

Just tested it - works now! - Thanx for the quick fix.