• We value your experience with Plesk during 2025
    Plesk strives to perform even better in 2026. To help us improve further, please answer a few questions about your experience with Plesk Obsidian 2025.
    Please take this short survey:

    https://survey.webpros.com/

Issue Migration: mdb.c assertion failed with Plesk 18.0.76

nelteren

New Pleskian
A large proportion of migrations fail when migrating between servers with a sizeable number of domains (>100) with the following C error:

Code:
stderr: [2026-02-24 16:37:15.295] 917424:699dc59ea1a8b ERR [util_exec] proc_close() failed ['/opt/psa/admin/bin/mailmng-core' '--add-domain' '--domain-name=$domain' '--disk-quota=104857600'] with exit code [1]
An error occurred during domain creation: mailmng-core failed: Fatal error: plesk::mail::postfix::PostfixConfigurationError(postmap: fatal: lmdb:/var/spool/postfix/plesk/virtual: internal error: mdb.c:2156: Assertion 'rc == 0' failed in mdb_page_dirty()

There's an article about it, which suggest patching up to version 18.0.75 @ https://support.plesk.com/hc/en-us/...ap-fatal-lmdb-var-spool-postfix-plesk-virtual

However, the issue doesn't appear completely fixed as it returns in my migration between two plesk hosts that are both version 18.0.76. Just checking what could cause it: mdb_page_dirty calls either mdb_mid2l_insert or mdb_mid2l_append, and this 'database' can apparently have some really low limits (64K page entries). It might be possible that this is being overflowed.

However, to really find out what's going on, a backtrace is needed. But I can't really tell what's fully being provided to mailmng_core; that executable depends on a JSON file being passed in on stdin, and I don't know what/where that file is, so I can't exactly say what the value of `rc` is, or which of the two branches it's taking, and what the stack trace is within the openldap library, which would all tell me more about the error, and perhaps tell me which bug I'm dealing with.

There have been a number of instances of this problem already with different programs in the past; with debian stable being as old as it is, I wouldn't be too surprised that I'm dealing with some 2017-2020 era bug such as Re: LMDB: issue with mdb_cursor_del.

Code:
apt-get -y install lmdb-utils

Can allow one to inspect the LMDB databases for corruption, but this tool itself seems to crash;

Code:
$ mdb_stat -narrfff /var/spool/postfix/plesk/virtual.lmdb

Reader Table Status
(no active readers)
  0 stale readers cleared.
(no active readers)
Freelist Status
  Tree depth: 1
  Branch pages: 0
  Leaf pages: 1
  Overflow pages: 0
  Entries: 9
    Transaction 357, 2 pages, maxspan 1
             5
            29
    Transaction 358, 3 pages, maxspan 1
            10
            23
            30
    Transaction 359, 3 pages, maxspan 1
             5
            26
            28
    Transaction 360, 3 pages, maxspan 1
             8
            17
            31
Bus error

Note that this is happening on a freshly installed server.
On the old server,
Code:
 mdb_stat -V
produces 0.9.24, on the new server, it's 0.9.31.
 
There's probably also something else I have to note about this strange issue: It's not consistent. It happens roughly 53.2% of the time (direct measurement).

However, given that this is not too many samples, that figure isn't very accurate. It's fairly likely the real chance is just 50/50. (A simpler hypothesis). Half the time, this assertion fails.

I haven't found any pattern in the domains that fail versus the ones that pass the migration. It doesn't have to do with the actual usage, even domains that are just redirects can fail being added to the new server's virtual.lmdb. It appears to be almost an entirely random sample.

I tried sorting the list of failed/successful domains in various ways to see if I could find a clear pattern, but could not see any at first glance.

If I knew what postmap is being called with, I could investigate further.
 
I've used the following to work around the issue. Note that I don't know if this could lead to some data loss.

The trick is to dump, then re-load the mdb database.

Code:
dir=/var/spool/postfix/plesk/
db=virtual

printf -v date '%(%Y-%m-%d-%H-%M-%S)T\n' -1
mdb_dump -n ${dir}/${db}.lmdb > ${dir}/${db}.dump
mdb_load -n -f ${dir}/${db}.dump ${dir}/${db}2.lmdb
mv ${dir}/${db}.lmdb ${dir}/${db}.lmdb.${date}.bak
mv ${dir}/${db}.lmdb-lock ${dir}/${db}.lmdb-lock.${date}.bak
mv ${dir}/${db}2.lmdb ${dir}/${db}.lmdb
mv ${dir}/${db}2.lmdb-lock ${dir}/${db}.lmdb-lock
 
One of the things I'm also dealing with, which is a bit of a bummer, is that when the migration does fail in this kind of way, that the destination server is left in a big mess.

It partially creates a bunch of stuff, and so if you ask the migrator to re-do or re-sync the failed domains, it errors out in a bunch of different ways because it only got partway through the first time, and isn't smart enough to re-check if things actually exist before trying to use them.

So it doesn't check if the linux user exists before trying to chown a directory with it, leading to failures. This one can be especially bad as linux usernames... they're not perfectly consistent with the domain/subscription names; there's a hidden algorithm creating them. So I have to manually fix a list of hundreds of usernames for the roughly half that are missing.

It doesn't check if some higher directories exist before making lower ones, again leading to failures. Again either manually check...

And of course, no guarantee there's not going to be later errors if there isn't previous ones.

The whole process seems a bit fragile to errors.
 
Back
Top