flowersteam/explauto

progress computation in discretized interest model

Closed this issue · 5 comments

I was wondering if the computation of progress in the discretized interest model is the intended one here.
Indeed, the np.cov is computing the covariance between [1, 2, 3, 4, 5] and the last competences e.g. [0.8, 0.6, 0.7, 0.9, 0.95].
An issue with that behavior is at the beginning of the exploration when few cells are sampled and few points are sampled in those cells.
The np.cov is in fact increasing even if progress is constant:

np.cov(range(3), [0.5, 0.6, 0.7])[0,1] = 0.099
np.cov(range(4), [0.5, 0.6, 0.7, 0.8])[0,1] = 0.166
np.cov(range(5), [0.5, 0.6, 0.7, 0.8, 0.9])[0,1] = 0.25

which leads to the exploration of n points in each randomly chosen cell with n corresponding roughly to the window's size.
That can be seen for instance in the notebook about curiosity in the scatter plots.

I tried another behavior in another branch here where the initial guessed progress of each unsampled cell is a constant (e.g. 10), and the progress is computed as the mean of the 5 last competencecs in the cell minus the mean of the 10 to 5 last ones.
In that case, there is an exploration biais that pushes to explore at least once in each cell before using progress based sampling.

What do you think ?
@clement-moulin-frier
@pierre-rouanet

Hi Sebastien, yes, what you propose is what we were doing in older Matlab code when regions were pre-determided.
However, when regions are dynamically built (like in IAC/RIAC/SAGG-RIAC), the initial progress value of a new region was either

  1. a new value computed on the history of points of the mother region falling in this region if there is enough points to compute a meaningful progress
  2. the same value of progress as the mother region if there are not enough points (and a constant value if the mother region did not already have a progress measure)

Le 29 déc. 2015 à 10:16, Sébastien Forestier notifications@github.com a écrit :

I was wondering if the computation of progress in the discretized interest model is the intended one here.
Indeed, the np.cov is computing the covariance between [1, 2, 3, 4, 5] and the last competences e.g. [0.8, 0.6, 0.7, 0.9, 0.95].
An issue with that behavior is at the beginning of the exploration when few cells are sampled and few points are sampled in those cells.
The np.cov is in fact increasing even if progress is constant:

np.cov(range(3), [0.5, 0.6, 0.7])[0,1] = 0.099
np.cov(range(4), [0.5, 0.6, 0.7, 0.8])[0,1] = 0.166
np.cov(range(5), [0.5, 0.6, 0.7, 0.8, 0.9])[0,1] = 0.25

which leads to the exploration of n points in each randomly chosen cell with n corresponding roughly to the window's size.
That can be seen for instance in the notebook about curiosity in the scatter plots.

I tried another behavior in another branch here where the initial guessed progress of each unsampled cell is a constant (e.g. 10), and the progress is computed as the mean of the 5 last competencecs in the cell minus the mean of the 10 to 5 last ones.
In that case, there is an exploration biais that pushes to explore at least once in each cell before using progress based sampling.

What do you think ?
@clement-moulin-frier
@pierre-rouanet


Reply to this email directly or view it on GitHub.

What you propose is fine to me (and important since the whole exploration history is very dependent to the first events). Thx

Ok, but did you have arguments or intuitions for this behavior that force to sample a few points, or should I implement a behavior really computing a derivative, as the current behavior might seem strange to people using it ?
For the derivative behavior, there still is the question of the guessed initial progress.
If it is high, all regions will be sampled before a discrimination based on real progress will take place (which is I think a good way to do if not too much cells are defined).
If it is low, interesting regions might be explored very lately.
As a reminder, in both implementations there is a sofmax smoothing to choose the exploring region.

This issue is actually what initially led me to propose a recursive region splitting mechanisms: initially you have very few cells,
and they multiply only in interesting areas. Having too many cells initially makes the idea of learning progress as bas as searching
for novelty I think (so maybe this is not a big problem in the library, we could have a tutorial comparing having many cells versus
recursive region splitting).

Le 11 janv. 2016 à 17:00, Sébastien Forestier notifications@github.com a écrit :

Ok, but did you have arguments or intuitions for this behavior that force to sample a few points, or should I implement a behavior really computing a derivative, as the current behavior might seem strange to people using it ?
For the derivative behavior, there still is the question of the guessed initial progress.
If it is high, all regions will be sampled before a discrimination based on real progress will take place (which is I think a good way to do if not too much cells are defined).
If it is low, interesting regions might be explored very lately.
As a reminder, in both implementations there is a sofmax smoothing to choose the exploring region.


Reply to this email directly or view it on GitHub.

I implemented this progress computation in DiscreteProgress #70.