Explaining the logic behing uniform_sensor_testcases?

Question

Explaining the logic behing uniform_sensor_testcases?

Opened this issue 8 years ago · 3 comments

I am now looking into the testcases generation. I find the method to generate uniform testcases over the sensory space very appealing. Code is here: https://github.com/flowersteam/explauto/blob/master/explauto/environment/testcase.py

My understanding is that a grid of a given resolution is projected on the sensory space, and that each cell is associated with only one observation from within that cell. My questions concern the resolution parameter:

is it the number of cut per dimension?
what is the logical behind the automatic calculation of resolution: resolution = max(2, int((1.3*n)**(1.0/len(robot.s_feats)))) ?

I also noticed this: # TODO : change obs only if nearer from center of coo.

From what I understand is that in each cell, the corresponding observation will be the last observation encountered in the _populate process. The todo is to replace that by keeping the closest to the center of the cell?

Answer 1 · 2016-08-09T15:24:17.000Z

One effect of the grid system is that you do not always have an observation in each cell, so when you ask for 100 test_cases, you often endup with less.

Another method could consist of using KMeans, with k = number of test_cases to find cell centers. Then find the closest observation from the cluster center. This ensures you get 100 test_cases if you ask for 100.
However, this is not really uniform, yet it is a good approximation for k<<n_samples. And the code already generates 100 times more samples than testcases: observations = uniform_motor_testcases(robot, 100*n).

Below is a small example, data in blue (1000 points), Kmean in red (20 points), selected in green (20 points).

Answer 2 · 2016-08-09T15:47:48.000Z

Here is a comparison between the two methods:

Dataset 1000 points.

Grid: ask for 20 points, got 18.
Selected in magenta (18 points)

Kmeans: ask for 20 points, got 20.
Kmean in red (20 points), selected in green (20 points)

There is a pool of point at in the bottom-left corner for failed experiment, so it is normal that a sample is selected there.

Resolution was automatically computed with the formula in post 1, it gave 5 for this. so I guess a 5x5 grid, which is 25 cells, out of which only 18 were populated.
Kmeans does look less uniform.

Answer 3 · 2016-08-16T10:45:28.000Z

I think I will stick with the k-means because it ensures n-points. But it is not optimal.

What we really want here is a kind of SOM with a constraint that the vectrice should be of similar length. (Scaling the data between 0 and 1 in each dimension beforehand).