flowersteam/explauto

Adding gmminf and ilo_gmm to sklearn

Opened this issue · 3 comments

Hi again,

I have been looking into sklearn recently and there is not gaussian mixture inference implemented. In explauto you implemented one, quite nicely by constructing on top of the GMM class from sklearn.

I think adding this to the official sklearn repo would be of tremendous help for many other people. Do you think the code is mature enough? Are there specific basic cases that are not handled? Is it worth our time?

In the same line, once gmminf is available, adding ilo_gmm shoudl be easy and could help diffusing this algortihm in the community.

Going further, we could also extend ILO-GMR for Dirichlet Process GMM which, in theory, requires less tuning as the number of gaussians is estimated on the fly. Thus, thinking in terms of ILO-DPGMM, the algorithm would adapt the number of Gaussian for each locality depending on the requirements at each specific location.

The gain in performance might not be huge in most cases but it is a nice way to make the algortihm even more elegant. The fact it is local makes it already very efficient because it is faster to compute the Gaussian position with EM (less data, only local) (at the cost of recomputing it each time we query the model :) ), it is easier to tune (because fixed number of data point for each fit, and locally we expect that only a few gaussians are enough), and run in constant time (each fit is made on a fixed and predefined number of samples). Yet, having a fixed number of Gaussian for all localities might not be optimal for some problems, which Dirichlet Process could help solving.

PS1: have a look at the pros and cons from sklearn, DPGMM is not much slower apparently

PS2: Another alternative is VBGMM but it includes bias/priors. That makes it harder to tune but also more robust to divergence of the covairance matrix.

Yes I noticed it also, GMM general inference doesn't really exist elsewhere, whereas it clearly kicks your ass (or at least mine).

It's kind of mature I think (has been used in many contexts). But is is still slow and could probably be optimized (see e.g. #52).

Also I heard from @omangin that publishing in sklearn is a pain in the ass (second ass in one message, I need to go out), those guys are apparently quite extremist (software-engineering speaking of course). Explauto has rather been coded by modern hippies.

Anyway it's still a pretty good idea to try it I think.

Regarding DP and VBGMM I can't really tell, I don't know much about them.

Also I heard from @omangin that publishing in sklearn is a pain in the ass (second ass in one message, I need to go out), those guys are apparently quite extremist (software-engineering speaking of course). Explauto has rather been coded by modern hippies.

Yes, this confirms the impression I got when looking at their issues and pull-request management. I think we should forget it for now.

I have seen a few other libraries that seems to implement inference but I haven't tested any of them, so I put them here just to keep track of them:

Regarding DP and VBGMM I can't really tell, I don't know much about them.

ILO-DPGMM is just something that could be done for fun. DPGMM seems to find out the "optimal" number of Gaussians for the data. I don't know much either, and certainly not the math behind. Yet the sklearn interface is the same as GMM and all your code should transpose the same. I would be curious to see in which conditions the automatic adaptation of the number of gaussians could help.

My guess is probably when:

  • part of the space is highly redondant, so the inverse model as many local clusters.
  • part of the X to Y mapping in multimodal, so a given X can produce Y in different, yet precise, regions.

In both cases, a small number of gaussians would tend to produce a poor fit.

Although the second case should not happen often in most real world problems (if the agent has access to the relevant features), the first case might happen more often (e.g. highly segmented robot arm).

In other words, for the forward model, a small number of gaussian should, in general, produce a good fit. For the inverse model, I would expect that a higher number of Gaussian is in general needed. Finding the proper number of gaussians, that works for the entire space, might be difficult. In that sense, maybe ILO-DPGMM could produce some improvements, it might be worth checking on a simulated arm one day.

Well all this is only from intuition, I have little practical experience in these things.