1. Please take a little time for this simple survey! Thank you for participating!
    Dismiss Notice
  2. Dear Pleskians, please read this carefully! New attachments and other rules Thank you!
    Dismiss Notice
  3. Dear Pleskians, I really hope that you will share your opinion in this Special topic for chatter about Plesk in the Clouds. Thank you!
    Dismiss Notice

Spamassassin training - maximum messages?

Discussion in 'Plesk for Linux - 8.x and Older' started by Jllynch, May 14, 2007.

  1. Jllynch

    Jllynch Regular Pleskian

    28
     
    Joined:
    Nov 11, 2003
    Messages:
    242
    Likes Received:
    0
    Our plesk spamassassin interface has stopped letting us learn more messages. Is there a maximum recomended number of learned messages?

    When we click on training we are shown
    Messages learned: 58818 as spam, 20955 as non-spam, 79773 total.

    Then when we select one message and click on "it's spam!" the page comes back almost instantly with
    Messages learned: 0 as spam, 0 as non-spam, 0 total.

    and no email messages listed.

    Might we have too many messages learned? Should we limit this with the bayes_expiry_max_db_size setting in /var/qmail/mailnames/name.com/name/.spamassassin ? Is something else going on?
     
  2. atomicturtle

    atomicturtle Golden Pleskian

    29
     
    Joined:
    Nov 20, 2002
    Messages:
    2,110
    Likes Received:
    7
    Location:
    Washington, DC
    Sort of, documentation is here:
    http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html

    There are conditions you can run into based on the total # of tokens in the database when it runs a sync event. Ive had far more messages than that on a common bayes DB, which is to be expected. On a single users mailbox that is suspiciously high, what could be happening is that you're running into a safety check that SA performs on a sync, which is used to expire old tokens. If memory serves it will not expire if it detects that it needs to purge more than 10,000. You'd need to look at the output of sa-learn --dump magic on that bayes db, and then calculate the age of the oldest token. If that age is greater than 90 days then you're not expiring your old tokens. That can happen if you train a huge batch of mail all at once.
     
  3. Jllynch

    Jllynch Regular Pleskian

    28
     
    Joined:
    Nov 11, 2003
    Messages:
    242
    Likes Received:
    0
    Thanks for that art. Here is the output;

    0.000 0 3 0 non-token data: bayes db version
    0.000 0 58984 0 non-token data: nspam
    0.000 0 20955 0 non-token data: nham
    0.000 0 130367 0 non-token data: ntokens
    0.000 0 1174963284 0 non-token data: oldest atime
    0.000 0 1179281285 0 non-token data: newest atime
    0.000 0 1179274134 0 non-token data: last journal sync atime
    0.000 0 1179190443 0 non-token data: last expiry atime
    0.000 0 2764800 0 non-token data: last expire atime delta
    0.000 0 136281 0 non-token data: last expire reduction count

    Which translates to;
    oldest time 27 Mar 2007
    newest time 16 May 2007
    last journal sync atime 16 May 2007
    last expiry atime 15 May 2007

    Does that look right to you?


    (Also we are getting the following error. Should we have a system wide db here or is it just looking there as we are running as root?
    [29052] dbg: bayes: no dbs present, cannot tie DB R/O: /root/.spamassassin/bayes_toks )
     
  4. Jllynch

    Jllynch Regular Pleskian

    28
     
    Joined:
    Nov 11, 2003
    Messages:
    242
    Likes Received:
    0
    It turns out that spam training via the control panel won't work for any user on the above mentioned machine. They all quickly get the page comeing back almost instantly with
    Messages learned: 0 as spam, 0 as non-spam, 0 total.

    after they try to train a message.

    Anyone have a clue why this might be?
     
Loading...