"there are 0 individuals in common" and "IDs in PLINK bed are not unique!"
Opened this issue · 1 comments
sidtjn commented
Hi, I have been trying to predict HLA allele type using HIBAG on two different datasets, one with all SNPs and the other with WGS data.
With the SNP dataset, I could not get the function hlaCompareAllele to work. The following is how I used the function;
> rv_ct0_sea730k <- hlaCompareAllele(true_b, hla_b_sea730k, call.threshold = 0)
Calling 'hlaCompareAllele': there are 0 individuals in common.
> rv_ct5_sea730k <- hlaCompareAllele(true_b, hla_b_sea730k, call.threshold = 0.5)
Calling 'hlaCompareAllele': there are 0 individuals in common.
I also tried training the data;
> sea730k_model <- hlaParallelAttrBagging(10, true_b, train.geno_sea730k, nclassifier = 100)
Error in .DynamicClusterCall(cl, fun = function(job, hla, snp, mtry, prune, :
One node produced an error: There is no common sample between 'hla' and 'snp'.
With the WGS dataset, I also could not get hlaBED2Geno to work.
> geno_dusun <- hlaBED2Geno("BNF_HLA.bed","BNF_HLA.bim","BNF_HLA.fam", assembly = "hg38")
Open "BNF_HLA.bed" in the SNP-major mode.
Error in hlaBED2Geno("BNF_HLA.bed", "BNF_HLA.bim", "BNF_HLA.fam", assembly = "hg38") :
IDs in PLINK bed are not unique!
The WGS dataset was converted from vcf to plink format using the plink tool.
For both the WGS and SNP dataset, can I resolve this by adjusting the data to a certain format?
- The SNP dataset is obtained from https://evolbio.ut.ee/SEA/ and
- the WGS dataset is obtained from https://www.simonsfoundation.org/simons-genome-diversity-project/ with the focus being on the Southeast Asian two Dusun individuals.
zhengxwen commented
You should check the sample IDs before you run any HIBAG function.
See train.geno_sea730k$sample.id
and true_b$value$sample.id
.