thibautjombart/adegenet

genet.dist not working for populations with large difference in sample size

georgeomics opened this issue · 3 comments

I am attempting to calculate Fst using genet.dist for 6 populations with the corresponding sample sizes:

  1   2   3   4   5   6 
 97 133 219  16  16  53 

My code looks like the following:

df1 <- subset(data, population %in% c(1,2))
    dg1 <- df2genind(d1, ploidy=2, ncode=1, pop=d1$population)
    calcFst <- genet.dist(dg1, method = "WC84")

And works great as long as one of the populations is not 4 or 5. If population 4 or 5 is used, I receive the following error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 113, 57
In addition: Warning message:
In matrix(unlist(e), ncol = x@ploidy[1], byrow = TRUE) :
  data length [113] is not a sub-multiple or multiple of the number of rows [57]

However, the code still works as intended when the populations being compared are 4 AND 5 (i.e., c(4,5)). One obvious thing to me is the difference in sample sizes. What could be the source of the error?

jgx65 commented

Hi,

This looks like a issue you have with hierfstat::genetdist rather that adegenet. You might consider reposting there. In any case, without an example data set, it is difficult to answer your question. And, I am wondering why you are subsetting your data, as hierfstat::genet.dist will produce estimates of genetic distances for all pairs of populations?

We're randomizing population assignments between pairwise regions hence the subsetting. Though I resolved the issue, which turned out to be due to the presence of the population column in populations with a relatively "small" number of individuals. I updated the code like so to remove the population column:

df1 <- subset(data, population %in% c(1,2))
    dg1 <- df2genind(d1[,-1], ploidy=2, ncode=1, pop=d1$population)
    calcFst <- genet.dist(dg1, method = "WC84")

Still not sure why the previous code runs fine with all other populations (and produces similar results), but not specifically for those populations with 16 individuals. But it is running as intended now across all population comparisons.

jgx65 commented

I repeat.

  • This is not an adegenet issue.
  • Without an example difficult to see where the problem is.

If you can live with the solution you found, please close this issue, otherwise,close it here and continue the conversation there.