Issues with dimensionality off-by-one
Opened this issue · 1 comments
GoogleCodeExporter commented
What steps will reproduce the problem?
1. Create this training file:
======= train.txt =======
1 1:1 2:.1 3:.1 200:1
1 1:1.2 2:.01 3:.01 200:1
1 1:3 2:.2 3:.41 200:1
-1 3:4 200:1
-1 2:3 200:1
-1 1:.1 2:3 3:2 200:1
====================
2. ./sofia-ml-read-only/sofia-ml --learner_type pegasos --loop_type stochastic
--lambda 0.1 --iterations 100000 --dimensionality 200 --training_file train.txt
--model_out debug-model.txt
3. debug-model.txt has:
-5.01486 -0.169397 -10.0628 -10.0518 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0\
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The the model should spit out 201 terms, the first being the bias term. Instead
it spits out 200, and clips off the last weight. When I set dimensionality to
201, I get what I would expect:
0.263645 0.561799 -0.509116 -0.382012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0.263645
This was compiled from source a couple weeks ago. The program should probably
crash if you say dimensionality is 200 and there is a "200:x" term in the
sparse vector representation, unless the no-bias flag is set.
Original issue reported on code.google.com by justi...@gmail.com
on 26 Feb 2013 at 3:24
GoogleCodeExporter commented
When you set dimensionality 200 it also includes the label, thus sofia expect 1
label and 199 features. So in your case dimensionality should indeed be 201. I
agree it's not very convenient and must be confusing at first sight.
Original comment by zhani...@myglam.com
on 7 May 2013 at 6:47