Clarify Redis statistics configuration since 1.7
hardware opened this issue · 6 comments
Hi,
Can you clarify Redis statistics configuration and changes to make since 1.7 release ? Based on this commit is the following configuration valid ?
# local.d/statistic.conf
classifier "bayes" {
tokenizer {
name = "osb";
}
backend = "redis";
min_tokens = 11;
min_learns = 10;
autolearn = true;
# Use new schema (1.7+)
new_schema = true;
# Enable per user statistics
per_user = true;
# Expire bayes tokens
expire = 100d;
# Store not only probabilities, but full tokens, false by default
#store_tokens = true;
# Store bayes signatures
#signatures = true;
statfile {
symbol = "BAYES_HAM";
spam = false;
}
statfile {
symbol = "BAYES_SPAM";
spam = true;
}
learn_condition =<<EOD
return function(task, is_spam, is_unlearn)
local prob = task:get_mempool():get_variable('bayes_prob', 'double')
if prob then
local in_class = false
local cl
if is_spam then
cl = 'spam'
in_class = prob >= 0.95
else
cl = 'ham'
in_class = prob <= 0.05
end
if in_class then
return false,string.format('already in class %s; probability %.2f%%',
cl, math.abs((prob - 0.5) * 200.0))
end
end
return true
end
EOD
}
My configuration prior rspamd 1.7 is available here.
Yes, it looks valid, though min_learns = 10;
seems too low.
The only changes in configuration you need to enable new schema is
new_schema = true;
expire = 100d;
but you need to convert the database as well.
The rspamadm configwizard statistic
command will do it for you.
Also you can shorten the configuration dramatically if you use local.d/classifier-bayes.conf
instead of local.d/statistic.conf
.
Also you can shorten the configuration dramatically if you use local.d/classifier-bayes.conf instead of local.d/statistic.conf.
Yes that's what I wanted to do yesterday, this should work :
# local.d/classifier-bayes.conf
cache {
backend = "redis";
}
backend = "redis";
min_learns = 50;
autolearn = true;
new_schema = true;
per_user = true;
expire = 100d;
statfile {
symbol = "BAYES_HAM";
spam = false;
}
statfile {
symbol = "BAYES_SPAM";
spam = true;
}
But you need to convert the database as well. The rspamadm configwizard statistic command will do it for you.
It does not work properly on my 2 mail servers, I always got this error :
rspamadm configwizard statistic
You have configured new schema for BAYES_SPAM/BAYES_HAM but your DB has old data
Do you wish to convert data to the new schema?[Y/n]:
Expire time for new tokens [default: 100d]:
converted OK elements from symbol BAYES_SPAM
converted 42386 elements from symbol BAYES_HAM
error converting metadata for symbol BAYES_SPAM
Conversion failed
No changes found, the wizard is finished now
But I will open another issue on rspamd main repository if I can not fix this problem. Meantime, I ask if other people have this problem on this issue : hardware/mailserver#228
I think this should be enough:
local.d/classifier-bayes.conf
backend = "redis";
min_learns = 50;
autolearn = true;
new_schema = true;
per_user = true;
expire = 100d;
Why don't you want to use the default min_learns
(200). It seems quite sane.
It's suspicious:
converted OK elements
Thank you :)
I think it might be useful to add this example in doc/configuration/statistic.md
to clarify Redis statistics configuration with Rspamd 1.7.
Why don't you want to use the default min_learns (200). It seems quite sane.
The default min_learns
isn't too much with per_user
enabled and with small/medium sized mail servers ?
It's suspicious:
converted OK elements
Yeah, very strange. How can I debug this ?
The default min_learns isn't too much with per_user enabled and with small/medium sized mail servers ?
IMO it's better keep classifier disabled while it is underlearned. If you need it working immediately you can train it on spam corpuses.
Regarding per user statistics I'd consider using two classifiers: per user and common.
Yeah, very strange. How can I debug this ?
Maybe you'll find something interesting in the Redis log?
Or in the database itself? Ham elements have been converted, but spam elements not. There should be some difference between them.
Do you have enough RAM btw? For conversion process you need about 3x more RAM than Redis uses for old statisitics.
DB should be equal to a string not a number.