Automatic SPAM and HAM learning

Discussion in 'Plesk for Linux - 8.x and Older' started by saschahb, Jul 11, 2005.

  1. saschahb

    saschahb Guest

    hi all,

    i've coded a script which makes it possible to all IMAP users to use the sa-learn function from spamassassin automatically...

    It works like that:
    Users have to create a SPAM and/or HAM folder within their root INBOX directory (via email client).
    Now the users have to sort SPAM mails, which are not correctly identified as SPAM, to this SPAM folder.
    And all mails which are identified as spam, but are no spam to the HAM folder.
    The script runs for example every hour or once a day. It checks every SPAM and HAM folder for mails and uses sa-learn to learn the content.
    SPAM mails will be deleted automatically after learnen (can be turned off within the script).
    HAM mails will be moved back to the INBOX after learning so the user can sort the mails to another folder.
    I hope that you'll like this script. Of course you can post any suggestions.

    You just have to add this script to the root-crontab.
    for example (at every full hour):
    0 * * * * /usr/local/sbin/qauto-salearn

    qauto-salearn perlscript

    PS: There are some config parameter within the script. I use the script with debian 3.1 and Plesk 7.5.3
  2. saschahb

    saschahb Guest

    bugfix and a new feature!
    Version 1.1

    - I bugfixed a little permission-problem with maildirectories where email is disabled within plesk
    - Another bugfix is that this script now uses sudo to sa-learn new mails and to create the bayes databases....
    In the last version the bayes databases had root-permissions. So the autolearn function from spamassassin didn't work correctly except the qauto-salearn one...

    within the script you have to define the path to sudo and the user which this script should give the bayes databases (it's "popuser" at debian 3.1 with plesk 7.5.3)

    Have fun...

    Download V1.1 of qauto-salearn
  3. jamesyeeoc

    jamesyeeoc Guest

    Thank you saschahb. I am sure there will be many who will make use of your script, I know there have been a number of posts wanting to know how to do this.
  4. saschahb

    saschahb Guest


    i'll keep on developing this script because i (and perhaps many more people) need this function daily... :D

    If you've got any suggestions don't hesitate to tell me...
  5. jamesyeeoc

    jamesyeeoc Guest

    I believe the paths and popuser should be fine for RH boxes as well, not too sure about if they are running on a VPS/Virtuozo type server though.

    If you ever get additional OS input from others and want to put in a check for OS type, I can give you a list of what files to check for to determine OS and version. Gave the same to lvalics for powertoys. Then maybe auto-set the paths/options per detected OS?
  6. Herby

    Herby Guest


    running on a RHEL box (Whitebox in fact)

    except these lines it seemed to work

    bayes expire_old_tokens: lock: 7278 cannot create tmp lockfile /root/.spamassassin/bayes.lock.<FQDN>.7278 for /root/.spamassassin/bayes.lock: Keine Berechtigung

  7. saschahb

    saschahb Guest

    hm, don't know why the script tries to update the bayesdatabes of root...

    you're running the script as root? in fact, you have to. perhaps you can give me some more informations about this problem. can't see where the matter is :(
  8. jamesyeeoc

    jamesyeeoc Guest

    In general, this Permission Denied error would usually indicate a problem with the bayes database path.

    Make sure your /etc/sysconfig/spamassassin file has a -H /var/qmail on the end of it.

    cat /etc/sysconfig/spamassassin
    SPAMDOPTIONS="-d -u qmailq -q -x -c -H /var/qmail "

    (Note: found this on ART's forum)
    [Edit] - ah, but would this be proper for a user by user training vs. systemwide...?
  9. atomicturtle

    atomicturtle Golden Pleskian

    Nov 20, 2002
    Likes Received:
    Washington, DC
    depends on if you're storing bayes/awl data in SQL or not. The -q flag means "use mysql", if you are using mysql then your training system would need to be modified to get sa-learn to use the correct syntax. Which unfortunately is extremely ugly, since sa-learn doesnt let you specify the user on the command line. Youve got to do some messy stuff with local.cf to get it into the right place.

    Its something I've been working out in atomic-psa, I'll have something together in another week or so I reckon.
  10. saschahb

    saschahb Guest

    Bugfixes and a new feature!
    Version 1.2

    - Some permissions fixed. Setting all bayes etc. files to user popuser.popuser...

    - If a special directory is configured this script will change the default .qmail user file that every detected spammail will be moved to this directory.
    For example: A user creates a directory "Perhapsspam" on this imap account and this script is configued to move all detected spam to "Perhapsspam" he will never get detected SPAM to his INBOX anymore. Always to this directory.

    Requirements: safecat has be installed on the server.

    PLEASE! Read the first lines of this script. You have to configure some things.

    If've tested this script under RHEL4. Debian should work, too.

    Have fun...

    Download V1.2 of qauto-salearn

    Next feature: Retention time for detected SPAM in the special directory
  11. saschahb

    saschahb Guest

    Version 1.3 is out...

    New feature (due to several requests):

    + SPAM retention time on special folder
    ( SPAM will be automatically deleted after a given amount of days)

    Download V1.3 of qauto-salearn
  12. beam

    beam Guest

    Thanks alot for that script.
    You should make the Folders variable.

  13. saschahb

    saschahb Guest

    You mean the SPAM and HAM Foldername?
  14. beam

    beam Guest

    Yes, I mean SPAM and HAM foldernames.

  15. beam

    beam Guest

    I have a problem with it on my server. sa-learn tries to learn to the Bayesdatabase of the runnig user. The sudo doesn't change the home-dir, so it tries /root/.spamassassin.
    If I use the -H option of sudo it learns /var/qmail/.spamassissin.
    I found the problem in sa-learn: The db-path only works for dump and import and not for spam and ham.
    Maybe its a solution to softlink the bayesfiles of the actual user to the home of the popuser...
  16. MMaverick

    MMaverick Guest

  17. MMaverick

    MMaverick Guest

    Error creating lockfile


    I got the following error when running the script:

    bayes expire_old_tokens: lock: 21979 cannot create tmp lockfile /root/.spamassassin/bayes.lock.dedicational.com.21979 for /root/.spamassassin/bayes.lock: Permission denied

    EDIT: Problem solved: Only had to change the $sluser value in the script. (Changed it to 'root' since the crontab was being run as root.)
  18. dl2rbi

    dl2rbi Guest

    tested with PLESK 8.0.1

    Looks very interesting ...
    Has anyone tested this script with PLESK 8.0.1 ?
    I think there are changes neccessary ...

    Werner (using PLESK 8.0.1 / SuSE 10/64bit)
  19. MMaverick

    MMaverick Guest

