RubixML/Sentiment

Error trying to train, WordCountVectorizer missing parameter $maxDocumentFrequency

bavamont opened this issue · 3 comments

I am getting this error, when I am trying to train using your train.php (https://github.com/RubixML/Sentiment/blob/master/train.php) example:
Fatal error: Uncaught TypeError: Argument 3 passed to Rubix\ML\Transformers\WordCountVectorizer::__construct() must be of the type int, object given....

In your example on Line 44 you have:
new WordCountVectorizer(10000, 3, new NGram(1, 2)),

But the constuctor for WordCountVectorizer expects this:
public function __construct(
int $maxVocabulary = PHP_INT_MAX,
int $minDocumentFrequency = 1,
int $maxDocumentFrequency = PHP_INT_MAX,
?Tokenizer $tokenizer = null
)
What would be your recommended parameters for WordCountVectorizer for your example to work best?

Good catch! Did you upgrade versions recently? We added the $maxDocumentFrequency parameter in 0.1.0-rc5 ... thanks for the reminder I am going to update the train script!

Let's try a setting of 5000 for maxDocumentFrequency ... let me know if you get better results with a different setting

Also if you'd like to join our channel on Telegram https://t.me/RubixML

Should be fixed in the latest update d076d86

Thanks again @bavamont!

Thank you @andrewdalpino !
I’ll try it with 5000 for maxDocumentFrequency.
Thanks again!