Clarifying the link between zdim and n_clusters

Question

Clarifying the link between zdim and n_clusters

DmitryKishkinev opened this issue 7 months ago · 1 comments

Not an issue really but a question for my lab understanding. I am trying to understand what the link between zdim (latent dimension spaces) and the number of motifs (n_clusters in config.yaml) is: could these number be completely disassociated or there is a rule here (for ex., the zdim is always the same or higher/lower than the number of clusters/motifs or there is no rule here whatsoever.

My understanding is that we should optimize the zdim so that we do not use too many zdim (looking at the batch-normalized error curve and stop at a number of zdim when the error curves stops going down) but the number of motifs could be any - a lot if we are looking for a very granular picture of behaviour - or a small number if we want to have a coarse structure of behaviour. So the question of n_cluster is more on a researcher but zdim is a computational optimization.

Any clarifications would be appreciated here.

Dmitry

Answer 1 · 2024-02-21T07:59:08.000Z

Following up to my previous post - perhaps zdim/latent dimensions is something that we need to find out and it is fixed for a given data set but n_clusters / number of motifs depends on a research question meaning that the researcher could be looking more or less granular into one's data depending on whether we want more/less information and details about behaviours of given data set. Practically, zdim is found by looking into the bending point / plateauing of Batch Normalised Mean Sq Errors but n_clusters/motifs could be pretty much any number. But please correct me if I am missing something here. Much appreciated.