microsoft/SparseSC

Unable to split the data in fit_fast() if my countries are less then 5.

Closed this issue · 2 comments

I am facing "ValueError: Cannot have number of splits n_splits=5 greater than the number of samples: n_samples=4." issue in fit_fast() when my countries in data are less then 5.

Screenshot 2023-05-04 at 22 05 51

I suspect that if you look at the trace, that this error actually comes from sklearn's Kfold function which is used for splitting the records in the data into folds (subsets of rows) for cross-validation and k-fold gradient descent. The default for the number of is 5 folds (splits) is here and here, and can be controlled manually using the gradient_folds parameter of fit().

In short, this message is just saying that you cant divide a dataset with 4 rows into 5 non-empty subsets.