extract log-likelihoods from unfinished cross-validation

Question

extract log-likelihoods from unfinished cross-validation

Closed this issue 5 years ago · 4 comments

Dear Gideon,

thanks a lot for this wonderful program. I have a request or question rather reporting an issue here. I am running ConStruct with >250 population samples and 10,000 marker SNPs in 10-fold replication with k=1:10 to identify the optimal number of spatial layers. As you can imagine, this takes some time - particularly both the sp and the nsp models are calculated consecutively before the logLikelihood table is calculated.

Thus, I wanted to ask if it is possible to either output the sp likelihood table before the calculation of the nsp model start, or if it is possible to manually calculate to likelihoods from the model.fit objects.

Thanks for your help!

Best, Martin

Answer 1 · 2019-04-11T19:13:46.000Z

Hi Martin,

Yes, that'll be slow (although, if you have access to a cluster/multi-core machine, note that you can use the parallel argument in x.validation to parallelize the cross-validation procedure across cores). Unfortunately, there's no easy way to output the standardized log-likelihood (lnL) for one particular value of K or one class of model (e.g., spatial vs. nonspatial).

We describe the full cross-validation procedure in the appendix of the paper (pg 50), but basically, for the _i_th cross-validation replicate, you're parameterizing each model (in your case, K=1:10, for both the spatial and non-spatial models) using the _i_th training data partition, then calculating the lnL of the testing partition given that parameterized model. Then - and this is the sticky wicket - you're standardizing those lnLs by subtracting the greatest lnL of any model for that particular partition, and those standardized lnLs can then be aggregated for any particular model across replicate partitions. So, until you've run the analyses for both the spatial and non-spatial models across all specified values of K, you can't get the standardized lnL for any model for that partition. Does that make sense, and also answer your question?

If you're getting super impatient with the x.validation runs, you could also try the calculate.layer.contributions approach. In datasets with lots of loci, it's not uncommon to see x.validation give strong statistical support for models with large K that don't make a lot of biological sense, or in which particular layers are contributing negligibly to overall covariance. In those cases, the layer contributions are often very helpful.

hope that helps, and sorry that it's a bit slow!
-Gideon

Answer 2 · 2019-05-29T13:32:14.000Z

Hi Martin,

Just doing some bookkeeping - should I mark this issue as resolved?

Answer 3 · 2019-07-05T18:30:29.000Z

Hi Martin,

Haven't heard from you, so I'm going to mark this as resolved, but if you want to reopen an issue, please do!

Answer 4 · 2019-07-05T19:15:52.000Z

Dear Gideon, My sincere apologies for not having responded earlier. Thanks a lot for your help. It’s absolutely fine for me to close this issue. All the best, Martin ******************************************************** Martin Kapun, PhD Senior Research Associate Department of Evolutionary Biology and Environmental Studies University of Zürich 34 (building)-J (floor) Winterthurerstrasse 190 CH-8057 Zurich +41 44 635 49 77 Lab of Genome Dynamics Dept Cell & Developmental Biology Center of Anatomy and Cell Biology Medical University of Vienna, Schwarzspanierstr. 17, HP 36 A-1090 Vienna https://www.researchgate.net/profile/Martin_Kapun ********************************************************

…

On 5 Jul 2019, at 14:30, gbradburd ***@***.***> wrote: Hi Martin, Haven't heard from you, so I'm going to mark this as resolved, but if you want to reopen an issue, please do! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.