TP performance on very short sequences

Question

TP performance on very short sequences

breznak opened this issue 9 years ago · 8 comments

I'm trying to learn many very short (size 3, predicting from 2nd element) sequences. But the performance is low.

The code is here:
https://github.com/numenta/nupic.research/compare/numenta:master...breznak:algebraic?expand=1
Could you check if the experiment is set up OK?

...idea was to learn algebraic expressions, 1+1=2

Results look like this:

[2568]   (4, 3, 7) ==> acc=0.26  best=5 WRONG   confidences:  0: 0.0526,  1: 0.1088,  4: 0.0400,  5: 0.4039,  7: 0.1823,  8: 0.0394,  3: 0.0724,  6: 0.0701, 
[2589]   (3, 5, 8) ==> acc=0.25  best=3 WRONG   confidences:  0: 0.0952,  1: 0.1050,  5: 0.1413,  7: 0.1205,  8: 0.0874,  9: 0.0721,  10: 0.0552,  3: 0.2035, 
[2596]   (2, 3, 5) ==> acc=0.25  best=4 WRONG   confidences:  0: 0.1143,  1: 0.0734,  2: 0.0974,  4: 0.2259,  5: 0.0975,  7: 0.0671,  8: 0.0962,  3: 0.1587, 
[2608]   (2, 2, 4) ==> acc=0.25  best=4 OK      confidences:  0: 0.0776,  1: 0.0418,  2: 0.1348,  4: 0.2189,  5: 0.1735,  7: 0.0769,  6: 0.0577,  3: 0.2022, 
[2664]   (3, 5, 8) ==> acc=0.26  best=5 WRONG   confidences:  0: 0.1439,  1: 0.1093,  2: 0.0841,  5: 0.2009,  8: 0.1045,  9: 0.0680,  6: 0.0578,  3: 0.1028, 
[2711]   (0, 0, 0) ==> acc=0.26  best=0 OK      confidences:  0: 0.4520,  1: 0.1349,  2: 0.0914,  4: 0.1430,  5: 0.0700,  6: 0.0191,  3: 0.0896, 
[2750]   (1, 4, 5) ==> acc=0.27  best=5 OK      confidences:  0: 0.0899,  1: 0.1518,  2: 0.1397,  4: 0.1392,  5: 0.1621,  8: 0.0425,  6: 0.1325,  3: 0.1216, 
[2777]   (4, 1, 5) ==> acc=0.27  best=1 WRONG   confidences:  0: 0.1801,  1: 0.2078,  2: 0.0705,  4: 0.1835,  5: 0.0616,  7: 0.0022,  3: 0.1997,  6: 0.0947, 
[2792]   (5, 1, 6) ==> acc=0.26  best=1 WRONG   confidences:  0: 0.1470,  1: 0.2468,  2: 0.0617,  4: 0.1615,  5: 0.1048,  7: 0.0020,  3: 0.1810,  6: 0.0952, 
[3027]   (0, 1, 1) ==> acc=0.21  best=1 OK      confidences:  0: 0.2115,  1: 0.2204,  2: 0.0723,  4: 0.1392,  5: 0.1176,  7: 0.0034,  3: 0.1195,  6: 0.1160, 
[3030]   (0, 1, 1) ==> acc=0.21  best=0 WRONG   confidences:  0: 0.2207,  1: 0.1477,  2: 0.0608,  4: 0.1150,  5: 0.1217,  7: 0.0033,  3: 0.2201,  6: 0.1107, 
[3205]   (0, 1, 1) ==> acc=0.18  best=6 WRONG   confidences:  0: 0.1643,  1: 0.1946,  2: 0.0711,  4: 0.1420,  5: 0.0755,  7: 0.0073,  6: 0.2179,  3: 0.1273, 
[3248]   (0, 5, 5) ==> acc=0.16  best=3 WRONG   confidences:  1: 0.1268,  4: 0.0846,  5: 0.1033,  7: 0.0798,  9: 0.1003,  10: 0.0920,  3: 0.1341,  6: 0.1295,

Answer 1 · 2015-06-30T23:32:04.000Z

@rhyolight @chetan51 can you take a look please?

Answer 2 · 2015-06-30T23:38:52.000Z

@breznak Why do you think an HTM layer can learn algebraic sequences? It will be able to memorize all sequences it has seen before, and it might be able to interpolate a bit between numbers it has already seen. It probably won't be able to extrapolate to new unseen numbers.

Answer 3 · 2015-07-01T00:23:53.000Z

@subutai

It will be able to memorize all sequences it has seen before,

This is core of my question, I'm seeing bad performance even for trained sequences (maybe mistake in my model setup!)

and it might be able to interpolate a bit between numbers it has already seen.

This is what I wanted to do. I let it see 1 (eg 1+0=1), 2 (1+1=2, and all variants), 3 (3 + 0 = 0) !but not (2+1=3 (=test-case)).

I tried to simplify the task for it, maybe that's the problem - natural & small numbers -> there's only a small number of combinations to get/and learn a certain number.
- eg 3, 0+3, 1+2, and flips; when 1+2 is in test-set, there's only little exposure
- using simple rational numbers might help (eg 1, 1.5, 2, 2.5 ...)
- using bigger range of numbers (1-100), but training space grows!

It probably won't be able to extrapolate to new unseen numbers.

I was wondering this.

Definitely not to classify them back from SDR to "number", eg 101.
But learn the pattern of moving pattern in SDR (left shift)? And multilayer HTM with feedback loop?
Can I pretrain SP with all possible numbers to get stable SDR representations from SP?

Why do you think an HTM layer can learn algebraic sequences?

First, I remembered the experiment where NN learn addition of 2 numbers and they identified "carry-bit" neuron.
- but it should be http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0053699
- ffNNs can: http://www.lcc.uma.es/~lfranco/A1-Franco+Cannas-1998.pdf
- also, your "What does the fox eat" example is almost same.

Answer 4 · 2015-07-01T16:22:27.000Z

This is core of my question, I'm seeing bad performance even for trained sequences (maybe mistake in my model setup!)

Just looked through your code very briefly. You might try a smaller resolution in your encoder, like 0.05. A resolution of 0.25 means that the numbers 1 and 2 will have something like 31-4=27 bits in common. The SP and TP might treat them as almost identical.

SP training: how large is your training set? The SP usually requires several hundred iterations before it stabilizes. In your situation it might not be necessary to train the SP - you might try inc/dec of 0. The TM also requires a few passes to learn high order sequences - one pass is not sufficient.

BTW I don't the HTM will be able to extrapolate beyond the numbers its trained with. It's very different from the fox eat example. Even with interpolation it probably won't provide mathematically correct answers.

Answer 5 · 2015-07-01T18:29:36.000Z

You might try a smaller resolution in your encoder, like 0.05. A resolution of 0.25 means that the numbers 1 and 2 will have something like 31-4=27 bits in common. The SP and TP might treat them as almost identical.

Thanks, did that - no noticeable change either way. Before I had it set that "1" & "2" would have no overlap, which was imho also wrong -> no way to learn similarity between them: ||1,2||<||1,3|| ...

SP training: how large is your training set? The SP usually requires several hundred iterations before it stabilizes. In your situation it might not be necessary to train the SP - you might try inc/dec of 0. The TM also requires a few passes to learn high order sequences - one pass is not sufficient.

I think large enough. I generate 10k random permutations of 2 args from [0..5], so that's 25 unique sequences. I have pamLength=1 and reset after each triplet (1+2=3 -> 1,2,3) - but that does not seem to make any difference either.

Answer 6 · 2015-07-01T18:37:10.000Z

ok, not sure what else to suggest without spending a lot of debugging time with the scripts. Learning basic sequences should work just fine - there are lots of tests of this. Maybe you can try a category encoder first to ensure there are no overlap related issues?

Answer 7 · 2015-07-02T09:52:47.000Z

@subutai got it working! by copying hotgym example values.
numenta/nupic.research@4b144a0
the settings (in this experiment in particular) tends to be very sensitive to param settings.
This single setting of numActiveColumnsPerInhArea(=40 from 10) makes it go from 10% acc to 90% on the 0..5x0..5 sentences.
Another support for numenta/nupic-legacy#1343

Answer 8 · 2015-07-02T15:47:51.000Z

This single setting of numActiveColumnsPerInhArea(=40 from 10) makes it go from 10% acc to 90% on the 0..5x0..5 sentences.

Yes, we need to have reasonable default values.