rwdavies/QUILT

Error in generating my own HLA reference panel:subscript out of bounds

beifanglingyang opened this issue · 2 comments

Hi Robbie,
I followed the instructions(https://github.com/rwdavies/QUILT/blob/master/example/QUILT_hla_reference_panel_construction.Md) and produced one successful reference package.However, it gave a subscript out of bounds error when I replaced the reference haplotype data.
Attached is the error message :
`[2022-05-27 10:04:03] Running QUILT_HLA_prepare_reference(outputdir = quilt_hla_reference_panel_10KG_files, nGen = 100, hla_types_panel = 20181129_HLA_types_full_1000_Genomes_Project_panel.txt, ipd_igmt_alignments_zip_file = Alignments_Rel_3480.zip, ref_fasta = newGRch38/GRCh38_full_analysis_set_plus_decoy_hla.fa, refseq_table_file = hla_ancillary_files/refseq.hg38.chr6.26000000.34000000.txt.gz, full_regionStart = 25587319, full_regionEnd = 33629686, buffer = 500000, region_exclude_file = hla_ancillary_files/hlagenes.txt, genetic_map_file = CHS_chr6.txt.gz, reference_haplotype_file = /BIGDATA/USER/wei/referencepanel/quiltrefren/new10KG/10KG.hap.gz, reference_legend_file = /BIGDATA/USER/wei/referencepanel/quiltrefren/new10KG/10KG.legend.gz, reference_sample_file = /BIGDATA/USER/wei/referencepanel/quiltrefren/new10KG/10KG.samples, reference_exclude_samplelist_file = exclude_ref_samples_for_testing.txt, reference_exclude_samples_for_initial_phasing = FALSE, all_hla_regions = c("A", "B", "C", "DMA", "DMB", "DOA", "DOB", "DPA1", "DPA2", "DPB1", "DPB2", "DQA1", "DQA2", "DQB1", "DRA", "DRB1", "DRB3", "DRB4", "DRB5", "E", "F", "G", "HFE", "H", "J", "K", "L", "MICA", "MICB", "N", "P", "S", "TAP1", "TAP2", "T", "U", "V", "W", "Y"), hla_regions_to_prepare = c("A", "B", "C", "DQB1", "DRB1"), chr = chr6, minRate = 0.1, nCores = 6)
sending incremental file list
Alignments_Rel_3480.zip

sent 8868604 bytes received 31 bytes 5912423.33 bytes/sec
total size is 8867405 speedup is 1.00
Archive: Alignments_Rel_3480.zip
creating: alignments/
inflating: __MACOSX/._alignments
inflating: alignments/K_nuc.txt
inflating: __MACOSX/alignments/._K_nuc.txt
inflating: alignments/DPA1_gen.txt
inflating: __MACOSX/alignments/._DPA1_gen.txt
inflating: alignments/DQB1_prot.txt
inflating: __MACOSX/alignments/._DQB1_prot.txt
inflating: alignments/A_prot.txt
inflating: __MACOSX/alignments/._A_prot.txt
inflating: alignments/DRB4_nuc.txt
inflating: __MACOSX/alignments/._DRB4_nuc.txt
inflating: alignments/DOA_nuc.txt
inflating: __MACOSX/alignments/._DOA_nuc.txt
inflating: alignments/W_nuc.txt
inflating: __MACOSX/alignments/._W_nuc.txt
inflating: alignments/B_nuc.txt
inflating: __MACOSX/alignments/._B_nuc.txt
inflating: alignments/S_gen.txt
inflating: __MACOSX/alignments/._S_gen.txt
inflating: alignments/DQA2_gen.txt
inflating: __MACOSX/alignments/._DQA2_gen.txt
inflating: alignments/F_gen.txt
inflating: __MACOSX/alignments/._F_gen.txt
inflating: alignments/V_nuc.txt
inflating: __MACOSX/alignments/._V_nuc.txt
inflating: alignments/C_nuc.txt
inflating: __MACOSX/alignments/._C_nuc.txt
inflating: alignments/DPB2_nuc.txt
inflating: __MACOSX/alignments/._DPB2_nuc.txt
inflating: alignments/HFE_nuc.txt
inflating: __MACOSX/alignments/._HFE_nuc.txt
inflating: alignments/G_gen.txt
inflating: __MACOSX/alignments/._G_gen.txt
inflating: alignments/DRB5_prot.txt
inflating: __MACOSX/alignments/._DRB5_prot.txt
inflating: alignments/DRB4_prot.txt
inflating: __MACOSX/alignments/._DRB4_prot.txt
inflating: alignments/J_nuc.txt
inflating: __MACOSX/alignments/._J_nuc.txt
inflating: alignments/DRB1_gen.txt
inflating: __MACOSX/alignments/._DRB1_gen.txt
inflating: alignments/DMB_gen.txt
inflating: __MACOSX/alignments/._DMB_gen.txt
inflating: alignments/N_gen.txt
inflating: __MACOSX/alignments/._N_gen.txt
inflating: alignments/TAP2_nuc.txt
inflating: __MACOSX/alignments/._TAP2_nuc.txt
inflating: alignments/DQB1_nuc.txt
inflating: __MACOSX/alignments/._DQB1_nuc.txt
inflating: alignments/DRB5_nuc.txt
inflating: __MACOSX/alignments/._DRB5_nuc.txt
inflating: alignments/DRB_prot.txt
inflating: __MACOSX/alignments/._DRB_prot.txt
inflating: alignments/MICB_gen.txt
inflating: __MACOSX/alignments/._MICB_gen.txt
inflating: alignments/A_nuc.txt
inflating: __MACOSX/alignments/._A_nuc.txt
inflating: alignments/DOB_nuc.txt
inflating: __MACOSX/alignments/._DOB_nuc.txt
inflating: alignments/T_nuc.txt
inflating: __MACOSX/alignments/._T_nuc.txt
inflating: alignments/DQA1_gen.txt
inflating: __MACOSX/alignments/._DQA1_gen.txt
inflating: alignments/E_gen.txt
inflating: __MACOSX/alignments/._E_gen.txt
inflating: alignments/DMB_prot.txt
inflating: __MACOSX/alignments/._DMB_prot.txt
inflating: alignments/P_gen.txt
inflating: __MACOSX/alignments/._P_gen.txt
inflating: alignments/DRB3_gen.txt
inflating: __MACOSX/alignments/._DRB3_gen.txt
inflating: alignments/DRB3_prot.txt
inflating: __MACOSX/alignments/._DRB3_prot.txt
inflating: alignments/H_nuc.txt
inflating: __MACOSX/alignments/._H_nuc.txt
inflating: alignments/MICA_prot.txt
inflating: __MACOSX/alignments/._MICA_prot.txt
inflating: alignments/Y_gen.txt
inflating: __MACOSX/alignments/._Y_gen.txt
inflating: alignments/DRA_gen.txt
inflating: __MACOSX/alignments/._DRA_gen.txt
inflating: alignments/DPA2_gen.txt
inflating: __MACOSX/alignments/._DPA2_gen.txt
inflating: alignments/L_gen.txt
inflating: __MACOSX/alignments/._L_gen.txt
inflating: alignments/DMA_gen.txt
inflating: __MACOSX/alignments/._DMA_gen.txt
inflating: alignments/DPA1_prot.txt
inflating: __MACOSX/alignments/._DPA1_prot.txt
inflating: alignments/MICA_gen.txt
inflating: __MACOSX/alignments/._MICA_gen.txt
inflating: alignments/G_prot.txt
inflating: __MACOSX/alignments/._G_prot.txt
inflating: alignments/F_prot.txt
inflating: __MACOSX/alignments/._F_prot.txt
inflating: alignments/TAP1_nuc.txt
inflating: __MACOSX/alignments/._TAP1_nuc.txt
inflating: alignments/DOA_prot.txt
inflating: __MACOSX/alignments/._DOA_prot.txt
inflating: alignments/DQA2_prot.txt
inflating: __MACOSX/alignments/._DQA2_prot.txt
inflating: alignments/U_nuc.txt
inflating: __MACOSX/alignments/._U_nuc.txt
inflating: alignments/TAP2_prot.txt
inflating: __MACOSX/alignments/._TAP2_prot.txt
inflating: alignments/DPB1_nuc.txt
inflating: __MACOSX/alignments/._DPB1_nuc.txt
inflating: alignments/A_gen.txt
inflating: __MACOSX/alignments/._A_gen.txt
inflating: alignments/DOB_gen.txt
inflating: __MACOSX/alignments/._DOB_gen.txt
inflating: alignments/T_gen.txt
inflating: __MACOSX/alignments/._T_gen.txt
inflating: alignments/DQA1_nuc.txt
inflating: __MACOSX/alignments/._DQA1_nuc.txt
inflating: alignments/E_nuc.txt
inflating: __MACOSX/alignments/._E_nuc.txt
inflating: alignments/DRA_prot.txt
inflating: __MACOSX/alignments/._DRA_prot.txt
inflating: alignments/P_nuc.txt
inflating: __MACOSX/alignments/._P_nuc.txt
inflating: alignments/DRB3_nuc.txt
inflating: __MACOSX/alignments/._DRB3_nuc.txt
inflating: alignments/CLASSI_nuc.txt
inflating: __MACOSX/alignments/._CLASSI_nuc.txt
inflating: alignments/H_gen.txt
inflating: __MACOSX/alignments/._H_gen.txt
inflating: alignments/DPA2_nuc.txt
inflating: __MACOSX/alignments/._DPA2_nuc.txt
inflating: alignments/CLASSI_prot.txt
inflating: __MACOSX/alignments/._CLASSI_prot.txt
inflating: alignments/Y_nuc.txt
inflating: __MACOSX/alignments/._Y_nuc.txt
inflating: alignments/DRA_nuc.txt
inflating: __MACOSX/alignments/._DRA_nuc.txt
inflating: alignments/L_nuc.txt
inflating: __MACOSX/alignments/._L_nuc.txt
inflating: alignments/HFE_prot.txt
inflating: __MACOSX/alignments/._HFE_prot.txt
inflating: alignments/DMA_nuc.txt
inflating: __MACOSX/alignments/._DMA_nuc.txt
inflating: alignments/MICA_nuc.txt
inflating: __MACOSX/alignments/._MICA_nuc.txt
inflating: alignments/DPB1_prot.txt
inflating: __MACOSX/alignments/._DPB1_prot.txt
inflating: alignments/TAP1_gen.txt
inflating: __MACOSX/alignments/._TAP1_gen.txt
inflating: alignments/U_gen.txt
inflating: __MACOSX/alignments/._U_gen.txt
inflating: alignments/DPB1_gen.txt
inflating: __MACOSX/alignments/._DPB1_gen.txt
inflating: alignments/C_prot.txt
inflating: __MACOSX/alignments/._C_prot.txt
inflating: alignments/B_prot.txt
inflating: __MACOSX/alignments/._B_prot.txt
inflating: alignments/DOB_prot.txt
inflating: __MACOSX/alignments/._DOB_prot.txt
inflating: alignments/K_gen.txt
inflating: __MACOSX/alignments/._K_gen.txt
inflating: alignments/DQA1_prot.txt
inflating: __MACOSX/alignments/._DQA1_prot.txt
inflating: alignments/TAP1_prot.txt
inflating: __MACOSX/alignments/._TAP1_prot.txt
inflating: alignments/DRB4_gen.txt
inflating: __MACOSX/alignments/._DRB4_gen.txt
inflating: alignments/DRB_nuc.txt
inflating: __MACOSX/alignments/._DRB_nuc.txt
inflating: alignments/DPA1_nuc.txt
inflating: __MACOSX/alignments/._DPA1_nuc.txt
inflating: alignments/DOA_gen.txt
inflating: __MACOSX/alignments/._DOA_gen.txt
inflating: alignments/W_gen.txt
inflating: __MACOSX/alignments/._W_gen.txt
inflating: alignments/B_gen.txt
inflating: __MACOSX/alignments/._B_gen.txt
inflating: alignments/S_nuc.txt
inflating: __MACOSX/alignments/._S_nuc.txt
inflating: alignments/DQA2_nuc.txt
inflating: __MACOSX/alignments/._DQA2_nuc.txt
inflating: alignments/F_nuc.txt
inflating: __MACOSX/alignments/._F_nuc.txt
inflating: alignments/E_prot.txt
inflating: __MACOSX/alignments/._E_prot.txt
inflating: alignments/DRB1_prot.txt
inflating: __MACOSX/alignments/._DRB1_prot.txt
inflating: alignments/V_gen.txt
inflating: __MACOSX/alignments/._V_gen.txt
inflating: alignments/MICB_prot.txt
inflating: __MACOSX/alignments/._MICB_prot.txt
inflating: alignments/C_gen.txt
inflating: __MACOSX/alignments/._C_gen.txt
inflating: alignments/HFE_gen.txt
inflating: __MACOSX/alignments/._HFE_gen.txt
inflating: alignments/DPB2_gen.txt
inflating: __MACOSX/alignments/._DPB2_gen.txt
inflating: alignments/G_nuc.txt
inflating: __MACOSX/alignments/._G_nuc.txt
inflating: alignments/J_gen.txt
inflating: __MACOSX/alignments/._J_gen.txt
inflating: alignments/DMB_nuc.txt
inflating: __MACOSX/alignments/._DMB_nuc.txt
inflating: alignments/DRB1_nuc.txt
inflating: __MACOSX/alignments/._DRB1_nuc.txt
inflating: alignments/DQB1_gen.txt
inflating: __MACOSX/alignments/._DQB1_gen.txt
inflating: alignments/TAP2_gen.txt
inflating: __MACOSX/alignments/._TAP2_gen.txt
inflating: alignments/DMA_prot.txt
inflating: __MACOSX/alignments/._DMA_prot.txt
inflating: alignments/N_nuc.txt
inflating: __MACOSX/alignments/._N_nuc.txt
inflating: alignments/DRB5_gen.txt
inflating: __MACOSX/alignments/._DRB5_gen.txt
inflating: alignments/MICB_nuc.txt
inflating: __MACOSX/alignments/._MICB_nuc.txt
[2022-05-27 10:04:08] Begin making HLA all alleles kmers file
[2022-05-27 10:04:08] Processing file 1 out of 39
[2022-05-27 10:05:11] Processing file 4 out of 39
[2022-05-27 10:05:15] Processing file 8 out of 39
[2022-05-27 10:06:11] Processing file 12 out of 39
[2022-05-27 10:06:41] Processing file 16 out of 39
[2022-05-27 10:10:43] Processing file 20 out of 39
[2022-05-27 10:10:49] Processing file 24 out of 39
[2022-05-27 10:10:53] Processing file 28 out of 39
[2022-05-27 10:13:15] Processing file 32 out of 39
[2022-05-27 10:13:23] Processing file 36 out of 39
[2022-05-27 10:13:47] Done making HLA all alleles kmers file
[2022-05-27 10:13:47] Begin making HLA snp format alleles file
[2022-05-27 10:13:47] Working on region:A
[2022-05-27 10:13:47] Working on region:B
[2022-05-27 10:13:47] Working on region:C
[2022-05-27 10:13:47] Working on region:DQB1
[2022-05-27 10:13:47] Working on region:DRB1
[2022-05-27 10:13:52] Determined automatically that the reference genome contains allele:DQB106:02:01:01
[2022-05-27 10:13:52] Make SNP format alleles file for:DQB1
[2022-05-27 10:13:54] Determined automatically that the reference genome contains allele:DRB1
15:01:01:01
[2022-05-27 10:13:54] Make SNP format alleles file for:DRB1
[2022-05-27 10:13:56] Determined automatically that the reference genome contains allele:A03:01:01:01
[2022-05-27 10:13:56] Make SNP format alleles file for:A
[2022-05-27 10:14:02] Determined automatically that the reference genome contains allele:C
07:02:01:03
[2022-05-27 10:14:02] Make SNP format alleles file for:C
[2022-05-27 10:14:02] Determined automatically that the reference genome contains allele:B*07:02:01:01
[2022-05-27 10:14:02] Make SNP format alleles file for:B
[2022-05-27 10:14:36] Make full alleles filled in file for:DQB1
[2022-05-27 10:14:41] Make final files:DQB1
[2022-05-27 10:14:52] Done working on region:DQB1
[2022-05-27 10:16:11] Make full alleles filled in file for:A
[2022-05-27 10:17:12] Make full alleles filled in file for:DRB1
[2022-05-27 10:17:21] Make final files:DRB1
[2022-05-27 10:17:53] Done working on region:DRB1
[2022-05-27 10:20:22] Make full alleles filled in file for:C
[2022-05-27 10:21:04] Make final files:C
[2022-05-27 10:21:20] Make full alleles filled in file for:B
[2022-05-27 10:21:37] Done working on region:C
[2022-05-27 10:24:09] Make final files:A
[2022-05-27 10:24:40] Done working on region:A
[2022-05-27 10:33:04] Make final files:B
[2022-05-27 10:33:34] Done working on region:B
[2022-05-27 10:33:34] Done making HLA snp format alleles file
[2022-05-27 10:33:34] Determine who to remove from full panel
[2022-05-27 10:33:34] Done determining who to remove from full panel
[2022-05-27 10:33:34] Running QUILT_prepare_reference(outputdir = quilt_hla_reference_panel_10KG_files, chr = chr6, nGen = 100, regionStart = 25587319, regionEnd = 33629686, buffer = 500000, output_file = quilt_hla_reference_panel_10KG_files/quilt.hrc.hla.all.haplotypes.RData, reference_haplotype_file = /BIGDATA/USER/wei/referencepanel/quiltrefren/new10KG/10KG.hap.gz, reference_legend_file = /BIGDATA/USER/wei/referencepanel/quiltrefren/new10KG/10KG.legend.gz, reference_sample_file = /BIGDATA/USER/wei/referencepanel/quiltrefren/new10KG/10KG.samples, reference_populations = NA, reference_phred = 30, reference_exclude_samplelist_file = quilt_hla_reference_panel_10KG_files/hlauntyped.exclude.txt, region_exclude_file = , genetic_map_file = CHS_chr6.txt.gz, nMaxDH = NA, tempdir = NA, make_fake_vcf_with_sites_list = FALSE, output_sites_filename = NA, expRate = 1, maxRate = 100, minRate = 0.1)
[2022-05-27 10:33:34] Program start
[2022-05-27 10:33:34] Begin converting reference haplotypes
[2022-05-27 10:33:35] Begin get haplotypes from reference
[2022-05-27 10:33:35] Load and validate reference legend header
[2022-05-27 10:33:35] Load and validate reference legend
[2022-05-27 10:33:36] Extract reference haplotypes
[2022-05-27 10:33:36] In the region to be imputed plus buffer there are the following number of SNPs:
[2022-05-27 10:33:36] 81159 from the posfile
[2022-05-27 10:33:36] 82139 from the reference haplotypes
[2022-05-27 10:33:36] 81159 in the intersection of the posfile and the reference haplotypes
[2022-05-27 10:34:59] Succesfully extracted 19928 haplotypes from reference data
[2022-05-27 10:34:59] End get haplotypes from reference
[2022-05-27 10:34:59] Getting and validating genetic map
[2022-05-27 10:35:21] Match genetic map to desired imputation positions
[2022-05-27 10:35:21] Done with getting genetic map
[2022-05-27 10:35:46] Using nMaxDH = 15
[2022-05-27 10:35:46] Save converted reference haplotypes
[2022-05-27 10:35:46] Done converting reference haplotypes
[2022-05-27 10:35:46] Begin assigning HLA types to haplotypes
[2022-05-27 10:35:46] Work on phasing in region:A
Error in cbind(ourtypes1, ourtypes2, dist11, dist12, dist21, dist22, phase1, :
subscript out of bounds
Calls: QUILT_HLA_prepare_reference -> phase_hla_haplotypes -> hla_perform_step_1_phasing
Execution halted
processs will sleep 30s
process end at :
Fri May 27 10:36:25 CST 2022`

The difference between my operation and the correct example was only that I use my own reference haplotype data and re-format it to --haplegendsample:
(1)The reference haplotype data I used(chr6.CGP.beagle52.filter_MAF.0.001.vcf.gz):
image

From the results, it seems that I generated the hap/legend/sample file correctly.
(2)The code I used:
module load tabix/0.2.6 module load bcftools/1.9-gcc-4.8.5 bcftools view --output-file 10KG.temp.vcf.gz --output-type z --min-alleles 2 --max-alleles 2 --types snps chr6.CGP.beagle52.filter_MAF.0.001.vcf.gz -r chr6:25000000-34000000 tabix 10KG.temp.vcf.gz bcftools convert --haplegendsample 10KG 10KG.temp.vcf.gz
(3)The result of it :
process will start at : Fri May 27 09:46:47 CST 2022 ++++++++++++++++++++++++++++++++++++++++ Hap file: 10KG.hap.gz Legend file: 10KG.legend.gz Sample file: 10KG.samples 82695 records written, 0 skipped: 0/0/0 no-ALT/non-biallelic/filtered tbx_index_build failed: oneKG.temp.vcf.gz processs will sleep 30s process end at : Fri May 27 09:53:50 CST 2022

I can't figure out where the problem is, it seems that the file(Reference haplotype data including HLA alleles database) needs to meet certain conditions?
Thanks very much!

Hi,

Thanks for the message. Sorry for the error. In general the code to build HLA reference sets isn't great as it lacks test coverage, and time commitments prevent me from comprehensively going over the code to minimize bugs, so unfortunately errors like this can happen that need to be sorted out.

Now, it looks like this might be potentially an easy bug
The line mentionned above with
cbind(ourtypes1, ourtypes2, dist11, dist12, dist21, dist22, phase1
only appears twice uncommented
with the first of these basically useless as the product (temp) is never referenced
https://github.com/rwdavies/QUILT/blob/master/QUILT/R/hla_prepare_phase_functions.R#L514
and the second also doesn't seem to be used
https://github.com/rwdavies/QUILT/blob/master/QUILT/R/hla_prepare_phase_functions.R#L629

I've gone ahead and commented those two lines out, and committed the change
0eb3c7c

So if you'd like you could try installing from github through

cd ~/proj/QUILT/ ## or similar
./scripts/build-and-install.R

and then seeing if that fixes the problem

Thanks
Robbie

Robbie,
Thank you very much for your prompt reply !
I'm happy to tell you that, your revised new version solves the problem perfectly after testing.
And I'm sorry that the server was unavailable for several days, so I didn't get back to you in time.