shockley/sofia-ml

Issues with dimensionality off-by-one

Opened this issue · 1 comments

What steps will reproduce the problem?
1. Create this training file:

======= train.txt  =======
1 1:1 2:.1 3:.1 200:1                                                           


1 1:1.2 2:.01 3:.01 200:1                                                       


1 1:3 2:.2 3:.41 200:1                                                          


-1 3:4 200:1                                                                    


-1 2:3 200:1                                                                    


-1 1:.1 2:3 3:2 200:1        
====================
2. ./sofia-ml-read-only/sofia-ml --learner_type pegasos --loop_type stochastic 
--lambda 0.1 --iterations 100000 --dimensionality 200 --training_file train.txt 
--model_out debug-model.txt                                                     


3. debug-model.txt has:
-5.01486 -0.169397 -10.0628 -10.0518 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0\
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

The the model should spit out 201 terms, the first being the bias term. Instead 
it spits out 200, and clips off the last weight. When I set dimensionality to 
201, I get what I would expect:

0.263645 0.561799 -0.509116 -0.382012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0.263645  

This was compiled from source a couple weeks ago. The program should probably 
crash if you say dimensionality is 200 and there is a "200:x" term in the 
sparse vector representation, unless the no-bias flag is set.

Original issue reported on code.google.com by justi...@gmail.com on 26 Feb 2013 at 3:24

When you set dimensionality 200 it also includes the label, thus sofia expect 1 
label and 199 features.  So in your case dimensionality should indeed be 201. I 
agree it's not very convenient and must be confusing at first sight.

Original comment by zhani...@myglam.com on 7 May 2013 at 6:47