Spamassassin training - maximum messages?

Jllynch · May 14, 2007

Our plesk spamassassin interface has stopped letting us learn more messages. Is there a maximum recomended number of learned messages?

When we click on training we are shown
Messages learned: 58818 as spam, 20955 as non-spam, 79773 total.

Then when we select one message and click on "it's spam!" the page comes back almost instantly with
Messages learned: 0 as spam, 0 as non-spam, 0 total.

and no email messages listed.

Might we have too many messages learned? Should we limit this with the bayes_expiry_max_db_size setting in /var/qmail/mailnames/name.com/name/.spamassassin ? Is something else going on?

atomicturtle · May 15, 2007

Sort of, documentation is here:
http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html

There are conditions you can run into based on the total # of tokens in the database when it runs a sync event. Ive had far more messages than that on a common bayes DB, which is to be expected. On a single users mailbox that is suspiciously high, what could be happening is that you're running into a safety check that SA performs on a sync, which is used to expire old tokens. If memory serves it will not expire if it detects that it needs to purge more than 10,000. You'd need to look at the output of sa-learn --dump magic on that bayes db, and then calculate the age of the oldest token. If that age is greater than 90 days then you're not expiring your old tokens. That can happen if you train a huge batch of mail all at once.

Jllynch · May 15, 2007

Thanks for that art. Here is the output;

0.000 0 3 0 non-token data: bayes db version
0.000 0 58984 0 non-token data: nspam
0.000 0 20955 0 non-token data: nham
0.000 0 130367 0 non-token data: ntokens
0.000 0 1174963284 0 non-token data: oldest atime
0.000 0 1179281285 0 non-token data: newest atime
0.000 0 1179274134 0 non-token data: last journal sync atime
0.000 0 1179190443 0 non-token data: last expiry atime
0.000 0 2764800 0 non-token data: last expire atime delta
0.000 0 136281 0 non-token data: last expire reduction count

Which translates to;
oldest time 27 Mar 2007
newest time 16 May 2007
last journal sync atime 16 May 2007
last expiry atime 15 May 2007

Does that look right to you?

(Also we are getting the following error. Should we have a system wide db here or is it just looking there as we are running as root?
[29052] dbg: bayes: no dbs present, cannot tie DB R/O: /root/.spamassassin/bayes_toks )

Jllynch · Jun 18, 2007

It turns out that spam training via the control panel won't work for any user on the above mentioned machine. They all quickly get the page comeing back almost instantly with
Messages learned: 0 as spam, 0 as non-spam, 0 total.

after they try to train a message.

Anyone have a clue why this might be?

Spamassassin training - maximum messages?

Jllynch

Regular Pleskian

atomicturtle

Golden Pleskian

Jllynch

Regular Pleskian

Jllynch

Regular Pleskian

Similar threads