"there are 0 individuals in common" and "IDs in PLINK bed are not unique!"

Question

"there are 0 individuals in common" and "IDs in PLINK bed are not unique!"

Opened this issue 5 years ago · 1 comments

Hi, I have been trying to predict HLA allele type using HIBAG on two different datasets, one with all SNPs and the other with WGS data.

With the SNP dataset, I could not get the function hlaCompareAllele to work. The following is how I used the function;

> rv_ct0_sea730k <- hlaCompareAllele(true_b, hla_b_sea730k, call.threshold = 0)
Calling 'hlaCompareAllele': there are 0 individuals in common.

> rv_ct5_sea730k <- hlaCompareAllele(true_b, hla_b_sea730k, call.threshold = 0.5)
Calling 'hlaCompareAllele': there are 0 individuals in common.

I also tried training the data;

> sea730k_model <- hlaParallelAttrBagging(10, true_b, train.geno_sea730k, nclassifier = 100)
Error in .DynamicClusterCall(cl, fun = function(job, hla, snp, mtry, prune,  : 
  One node produced an error: There is no common sample between 'hla' and 'snp'.

With the WGS dataset, I also could not get hlaBED2Geno to work.

> geno_dusun <- hlaBED2Geno("BNF_HLA.bed","BNF_HLA.bim","BNF_HLA.fam", assembly = "hg38")
Open "BNF_HLA.bed" in the SNP-major mode.
Error in hlaBED2Geno("BNF_HLA.bed", "BNF_HLA.bim", "BNF_HLA.fam", assembly = "hg38") : 
  IDs in PLINK bed are not unique!

The WGS dataset was converted from vcf to plink format using the plink tool.

For both the WGS and SNP dataset, can I resolve this by adjusting the data to a certain format?

The SNP dataset is obtained from https://evolbio.ut.ee/SEA/ and
the WGS dataset is obtained from https://www.simonsfoundation.org/simons-genome-diversity-project/ with the focus being on the Southeast Asian two Dusun individuals.

Answer 1 · 2020-02-03T06:12:25.000Z

You should check the sample IDs before you run any HIBAG function.
See train.geno_sea730k$sample.id and true_b$value$sample.id.