biolab/orange3

SVM and/or Preprocess with sparse data (BoW): no way to get rid of warning "Input data is sparse, default preprocessing is to scale it"

wvdvegte opened this issue · 2 comments

What's wrong?
When I'm trying to classify text processed with Bag-of-Words using SVM, the SVM dialog box shows a warning "Input data is sparse, default preprocessing is to scale it" and it won't perform classification. I would expect that Preprocess > Normalize Features > scale to σ^2 = 1 before SVM would do the trick to apply scaling to the sparse BoW data, but that produces the same warning in the SVM widget.

How can we reproduce the problem?
Try to apply SVM to text processed as BoW together with a categorical variable based on which the text can be classified. Try to insert Preprocess with Normalize Features > scale to σ^2 = 1 before SVM

What's your environment?

  • Operating system: Mac OS 14.6.1
  • Orange version: 3.37.0
  • How you installed Orange: from DMG; updates through Add-ons menu

You are right. Preprocess scaling leaves the data sparse. You can use the same method (only one that works on sparse data) in Continuize as an alternative.

I forgot to say that my workflow (Windows 10, Orange 3.37.0) does produce predictions despite the warning. So I could not reproduce that part of the problem.