thibautjombart/adegenet

ERROR: number of cluster centres must lie between 1 and nrow(x)

kopelol opened this issue · 3 comments

Hello everyone

I'm trying to do DAPC analysis using vcf file including genome-wide SNPs(197,696 loci) data obtained from 28 strains.

Firstly, I converted vcf file into genind file.
library(vcfR)
x <- read.vcfR("file.vcf", verbose=F)
y <- vcfR2genind(x)

x

***** Object of Class vcfR *****
28 samples
1 CHROMs
197,696 variants
Object size: 66.4 Mb
0 percent missing data


y

/// GENIND OBJECT /////////

// 28 individuals; 197,696 loci; 406,061 alleles; size: 165 Mb

// Basic content
@tab: 28 x 406061 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 1-4)
@loc.fac: locus factor for the 406061 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: adegenet::df2genind(X = t(x), sep = sep)

// Optional content

  • empty -

Then,
grp <- find.clusters(y, max.n.clust=40)
to identify clusters.

After following message, I put adequate number.

Choose the number PCs to retain

I got following error.

number of cluster centres must lie between 1 and nrow(x)

I did this analysis using provided example data successfully,
so I think my data type is not suitable.

Could you please give me some advise?

Regards,

It's haploid.

number of cluster centres must lie between 1 and nrow(x)

Run the function without the max.n.clust argument.

You have 28 samples in your data set, but you chose to have a maximum of 40 clusters. It's failing because iterates the algorithm over the number of possible clusters. As soon as it reaches the number of clusters equal to the number of individuals, it will fail.

The default maximum number of clusters is round(nInd(y)/10).

Thank you for your reply.
It worked well and I could understand.

Thank you!!