Question Migrating SpamAssassin database

Fabian H · Mar 23, 2022

Hi there,
I am just migrating my plesk server to another plesk server (other os).
As I used the sa-learn utility multiple times for learning spam and my customers always marked mails as spam to train spamassassin, I want to migrate the spamassassins database to the new server too.
How can I do that?

Maarten · Mar 23, 2022

As far as I know, the trained data is stored in bayes_seen, bayes_journal, and bayes_toks.
These files can be found in the .spamassassin folder:

/var/qmail/mailnames/domain/account/.spamassassin

So, when you migrate to a new server, my assumption is these files will be migrated too.

Fabian H · Mar 24, 2022

Alright, thanks.

Another question about this:
If using sa-learn utility, in which .spamassassin folder is the updated database synced? Maybe in all which are having the .spamassassin folder?

Maarten · Mar 24, 2022

Every mail account that has the spamfilter enabled has this .spamassassin folder.

https://support.plesk.com/hc/en-us/articles/115003117405-How-to-train-SpamAssassin-on-Plesk-server-

It updated daily by the Plesk daily task:

https://support.plesk.com/hc/en-us/articles/213950065-How-to-rerun-daily-maintenance-tasks-separately-for-Plesk-on-Linux-

Fabian H · Apr 13, 2022

maartenv said:
Every mail account that has the spamfilter enabled has this .spamassassin folder.

https://support.plesk.com/hc/en-us/articles/115003117405-How-to-train-SpamAssassin-on-Plesk-server-

It updated daily by the Plesk daily task:

https://support.plesk.com/hc/en-us/articles/213950065-How-to-rerun-daily-maintenance-tasks-separately-for-Plesk-on-Linux-

After my migration, lots more spam is incoming to my mail accounts.
I checked with the sa-learn utility and the following is what I found:

On the old (destination) server:

Code:

[root@host-old:~]$sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0       1025          0  non-token data: nspam
0.000          0       1323          0  non-token data: nham
0.000          0     122009          0  non-token data: ntokens
0.000          0 1453152776          0  non-token data: oldest atime
0.000          0 1648325086          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction count

On the new (target) server:

Code:

[root@host:~]$sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        100          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0      10144          0  non-token data: ntokens
0.000          0 1648416734          0  non-token data: oldest atime
0.000          0 1649728411          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction count

So, it seems like the trained data isn't migrated with the Plesk migrator.
Any ideas on how I can migrate this too?

danami · Apr 13, 2022

Fabian H said:
So, it seems like the trained data isn't migrated with the Plesk migrator.
Any ideas on how I can migrate this too?

The data is migrated. You just need to point it to the mailboxes .spamassassin folder. Remember by default bayes data is per mailbox. You were just checking the global settings. You need to use:

Code:

sa-learn --dbpath /var/qmail/mailnames/example.com/user/.spamassassin --dump magic

Fabian H · Apr 13, 2022

danami said:
The data is migrated. You just need to point it to the mailboxes .spamassassin folder. Remember by default bayes data is per mailbox. You were just checking the global settings. You need to use:

Code:

sa-learn --dbpath /var/qmail/mailnames/example.com/user/.spamassassin --dump magic

Thanks!
Seems like it's correct, the nspam and nham in the databases are nearly the same.
But what path is used without using the --dbpath parameter?
So, sometimes obviously spam mails get a score of 2.0, but I and my customers are training the filter each time they get spam mails.

For some of my accounts, I am running /var/qmail/mailnames/$domain/$account/Maildir/.Spam/cur/ to train manually (within a script).

Fabian H · Apr 13, 2022

EDIT: I meant I am using

Code:

 sa-learn --spam /var/qmail/mailnames/$domain/$account/Maildir/.Spam/cur/

to train manually.

danami · Apr 13, 2022

Unfortunately spamassassin takes much more than just bayes training in order to configure it properly. (Which is why your spam scores are at 2.0). Spamassassin is extremely powerful but has a very high learning curve. I've been using it myself for almost 15 years and I'm still learning new things. When properly configured most spam scores should be well above 10.0 (many are above 20.0)

Fabian H · Apr 13, 2022

danami said:
Unfortunately spamassassin takes much more than just bayes training in order to configure it properly. (Which is why your spam scores are at 2.0). Spamassassin is extremely powerful but has a very high learning curve. I've been using it myself for almost 15 years and I'm still learning new things. When properly configured most spam scores should be well above 10.0 (many are above 20.0)

Okay, I see!
So, can you give some recommendments on how to configure it?

Question Migrating SpamAssassin database

Fabian H

Basic Pleskian

Maarten

Golden Pleskian

Fabian H

Basic Pleskian

Maarten

Golden Pleskian

Fabian H

Basic Pleskian

danami

Silver Pleskian

Fabian H

Basic Pleskian

Fabian H

Basic Pleskian

danami

Silver Pleskian

Attachments

Fabian H

Basic Pleskian

Similar threads