cjlin1/liblinear

One-class SVM causes integer division by zero

Closed this issue · 1 comments

Using v247 of liblinear-official, from python.
During one class svm training, integer division by zero occurs. The issue doesn't happen with any data(data dependent).
Likely caused by one of the quick_select_k calls:

quick_select_min_k(max_negG_of_Iup, 0, len_Iup-1, len_Iup-max_inner_iter);

Python code to reproduce(install numpy, scikit-learn, liblinear-official):

import numpy as np
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler, MinMaxScaler
#liblinear oneclass
import liblinear.liblinearutil
np.random.seed(123 +1)
x, y = make_regression(10000, 2500, n_informative=250)
x = MinMaxScaler().fit_transform(x)
x += 0.1 #make problem solvable (shift into positive)
#y = None
y = np.ones(len(x))
#y is unused, any number
print("liblinear start")
liblinearProblem = liblinear.liblinearutil.problem(y, x) #slow, enumerated data in raw python loop #liblinear has y, x order
modelParams = liblinear.liblinearutil.parameter('-s 21 -n 0.01') #-s 21 one class svm, -n 0.01 sets nu
print("liblinear train")
liblinearModel = liblinear.liblinearutil.train(liblinearProblem, modelParams)
print("liblinear finished")
quit()

Outputs:

liblinear start
liblinear train
Traceback (most recent call last):
  File "liblinear_repro.py", line 19, in <module>
    liblinearModel = liblinear.liblinearutil.train(liblinearProblem, modelParams)
  File "E:\Program Files\Python\Python37\lib\site-packages\liblinear\liblinearutil.py", line 164, in train
    m = liblinear.train(prob, param)
OSError: exception: integer divide by zero

The OSError is jus pythons way for reporting division by zero exception in the underlying library. From quick debug, it seems to be the partition function:

static int partition(feature_node *nodes, int low, int high)

The division by zero comes from the modulo operator % (it's also a division instruction for the cpu, hence the name of exception).
If the issue doesn't reproduce, try perturbing the value passed to np.random.seed above for a few times.

Thank you so much for pointing out the issue.
Yes, there is a bug.
We have fixed it and changed our internal code. It will be available soon in
the next release. Thank you again.