• We value your experience with Plesk during 2025
    Plesk strives to perform even better in 2026. To help us improve further, please answer a few questions about your experience with Plesk Obsidian 2025.
    Please take this short survey:

    https://survey.webpros.com/

Issue Migration: mdb.c assertion failed with Plesk 18.0.76

nelteren

New Pleskian
A large proportion of migrations fail when migrating between servers with a sizeable number of domains (>100) with the following C error:

Code:
stderr: [2026-02-24 16:37:15.295] 917424:699dc59ea1a8b ERR [util_exec] proc_close() failed ['/opt/psa/admin/bin/mailmng-core' '--add-domain' '--domain-name=$domain' '--disk-quota=104857600'] with exit code [1]
An error occurred during domain creation: mailmng-core failed: Fatal error: plesk::mail::postfix::PostfixConfigurationError(postmap: fatal: lmdb:/var/spool/postfix/plesk/virtual: internal error: mdb.c:2156: Assertion 'rc == 0' failed in mdb_page_dirty()

There's an article about it, which suggest patching up to version 18.0.75 @ https://support.plesk.com/hc/en-us/...ap-fatal-lmdb-var-spool-postfix-plesk-virtual

However, the issue doesn't appear completely fixed as it returns in my migration between two plesk hosts that are both version 18.0.76. Just checking what could cause it: mdb_page_dirty calls either mdb_mid2l_insert or mdb_mid2l_append, and this 'database' can apparently have some really low limits (64K page entries). It might be possible that this is being overflowed.

However, to really find out what's going on, a backtrace is needed. But I can't really tell what's fully being provided to mailmng_core; that executable depends on a JSON file being passed in on stdin, and I don't know what/where that file is, so I can't exactly say what the value of `rc` is, or which of the two branches it's taking, and what the stack trace is within the openldap library, which would all tell me more about the error, and perhaps tell me which bug I'm dealing with.

There have been a number of instances of this problem already with different programs in the past; with debian stable being as old as it is, I wouldn't be too surprised that I'm dealing with some 2017-2020 era bug such as Re: LMDB: issue with mdb_cursor_del.

Code:
apt-get -y install lmdb-utils

Can allow one to inspect the LMDB databases for corruption, but this tool itself seems to crash;

Code:
$ mdb_stat -narrfff /var/spool/postfix/plesk/virtual.lmdb

Reader Table Status
(no active readers)
  0 stale readers cleared.
(no active readers)
Freelist Status
  Tree depth: 1
  Branch pages: 0
  Leaf pages: 1
  Overflow pages: 0
  Entries: 9
    Transaction 357, 2 pages, maxspan 1
             5
            29
    Transaction 358, 3 pages, maxspan 1
            10
            23
            30
    Transaction 359, 3 pages, maxspan 1
             5
            26
            28
    Transaction 360, 3 pages, maxspan 1
             8
            17
            31
Bus error

Note that this is happening on a freshly installed server.
On the old server,
Code:
 mdb_stat -V
produces 0.9.24, on the new server, it's 0.9.31.
 
There's probably also something else I have to note about this strange issue: It's not consistent. It happens roughly 53.2% of the time (direct measurement).

However, given that this is not too many samples, that figure isn't very accurate. It's fairly likely the real chance is just 50/50. (A simpler hypothesis). Half the time, this assertion fails.

I haven't found any pattern in the domains that fail versus the ones that pass the migration. It doesn't have to do with the actual usage, even domains that are just redirects can fail being added to the new server's virtual.lmdb. It appears to be almost an entirely random sample.

I tried sorting the list of failed/successful domains in various ways to see if I could find a clear pattern, but could not see any at first glance.

If I knew what postmap is being called with, I could investigate further.
 
I've used the following to work around the issue. Note that I don't know if this could lead to some data loss.

The trick is to dump, then re-load the mdb database.

Code:
dir=/var/spool/postfix/plesk/
db=virtual

printf -v date '%(%Y-%m-%d-%H-%M-%S)T\n' -1
mdb_dump -n ${dir}/${db}.lmdb > ${dir}/${db}.dump
mdb_load -n -f ${dir}/${db}.dump ${dir}/${db}2.lmdb
mv ${dir}/${db}.lmdb ${dir}/${db}.lmdb.${date}.bak
mv ${dir}/${db}.lmdb-lock ${dir}/${db}.lmdb-lock.${date}.bak
mv ${dir}/${db}2.lmdb ${dir}/${db}.lmdb
mv ${dir}/${db}2.lmdb-lock ${dir}/${db}.lmdb-lock
 
One of the things I'm also dealing with, which is a bit of a bummer, is that when the migration does fail in this kind of way, that the destination server is left in a big mess.

It partially creates a bunch of stuff, and so if you ask the migrator to re-do or re-sync the failed domains, it errors out in a bunch of different ways because it only got partway through the first time, and isn't smart enough to re-check if things actually exist before trying to use them.

So it doesn't check if the linux user exists before trying to chown a directory with it, leading to failures. This one can be especially bad as linux usernames... they're not perfectly consistent with the domain/subscription names; there's a hidden algorithm creating them. So I have to manually fix a list of hundreds of usernames for the roughly half that are missing.

It doesn't check if some higher directories exist before making lower ones, again leading to failures. Again either manually check...

And of course, no guarantee there's not going to be later errors if there isn't previous ones.

The whole process seems a bit fragile to errors.
 
Unfortunately for me, while reloading the mdb database does remove some of the issues the mdb tools seem to be having with this file, it doesn't actually help in resolving the issue.

If I remove the subscriptions and re-try the migration for the offending ones only, they seem to still be erroring out. That is, about half of the things fail to migrate at all.
 
I'm also seeing this issue (sometimes, more rarely), same library:

Code:
[2026-02-26 13:09:05.431] 2710126:69a037da9a63b ERR [util_exec] proc_close() failed ['/opt/psa/admin/bin/mailmng-core' '--add-domain' '--domain-name=buses.nl' '--disk-quota=0'] with exit code [1]
An error occurred during domain creation: mailmng-core failed: Fatal error: plesk::mail::postfix::PostfixConfigurationError(postmap: fatal: lmdb:/var/spool/postfix/plesk/virtual: internal error: mdb.c:3184: Assertion 'pglast <= env->me_pglast' failed in mdb_freelist_save()
)
 
Did some more digging as to the cause of the bug; I briefly replaced the executable with a shell script that logs the stdin parameter so I could figure out the JSON I needed to send (rather than trying to understand all the python code behind the migration tool backend).

This allowed me to call the mailmng-core tool directly with the same arguments it's called with by the migration tool.

An strace reveals that it looks like a postmap call is causing the issue.

Code:
echo '{"domain":{"name":"acme.com","dom_aliases":[],"maillists":[],"mails":[]},"certificates":[]}' | strace -fv -s 512 -e execve /opt/psa/admin/bin/mailmng-core --add-domain --domain-name=acme.com --disk-quota=0
execve("/opt/psa/admin/bin/mailmng-core", ["/opt/psa/admin/bin/mailmng-core", "--add-domain", "--domain-name=acme.com", "--disk-quota=0"], ["SHELL=/bin/bash", "LANGUAGE=en_US:en", "PWD=/root/working", "LOGNAME=root", "HOME=/root", "LANG=en_US.UTF-8", "TERM=xterm", "USER=root", "SHLVL=1", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "MAIL=/var/mail/root", "OLDPWD=/opt/psa/admin/bin", "_=/usr/bin/strace"]) = 0
execve("/opt/psa/admin/sbin/mailmng-core", ["/opt/psa/admin/bin/mailmng-core", "--add-domain", "--domain-name=acme.com", "--disk-quota=0"], ["SHELL=/bin/bash", "LANGUAGE=en_US:en", "PWD=/root/working", "LOGNAME=root", "HOME=/root", "LANG=en_US.UTF-8", "TERM=xterm", "USER=root", "SHLVL=1", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "MAIL=/var/mail/root", "OLDPWD=/opt/psa/admin/bin", "_=/usr/bin/strace"]) = 0
strace: Process 3031331 attached
[pid 3031331] execve("/usr/sbin/postmap", ["/usr/sbin/postmap", "-s", "lmdb:/var/spool/postfix/plesk/virtual"], ["SHELL=/bin/bash", "LANGUAGE=en_US:en", "PWD=/root/working", "LOGNAME=root", "HOME=/root", "LANG=en_US.UTF-8", "TERM=xterm", "USER=root", "SHLVL=1", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "MAIL=/var/mail/root", "OLDPWD=/opt/psa/admin/bin", "_=/usr/bin/strace"]) = 0
[pid 3031331] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3031331, si_uid=0, si_status=0, si_utime=0, si_stime=4 /* 0.04 s */} ---
strace: Process 3031332 attached
[pid 3031332] execve("/usr/sbin/postmap", ["/usr/sbin/postmap", "lmdb:/var/spool/postfix/plesk/virtual"], ["SHELL=/bin/bash", "LANGUAGE=en_US:en", "PWD=/root/working", "LOGNAME=root", "HOME=/root", "LANG=en_US.UTF-8", "TERM=xterm", "USER=root", "SHLVL=1", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "MAIL=/var/mail/root", "OLDPWD=/opt/psa/admin/bin", "_=/usr/bin/strace"]) = 0
[pid 3031332] +++ exited with 1 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3031332, si_uid=0, si_status=1, si_utime=1 /* 0.01 s */, si_stime=2 /* 0.02 s */} ---
Fatal error: plesk::mail::postfix::PostfixConfigurationError(postmap: fatal: lmdb:/var/spool/postfix/plesk/virtual: internal error: mdb.c:2156: Assertion 'rc == 0' failed in mdb_page_dirty()
)
Fatal error: plesk::mail::postfix::PostfixConfigurationError(postmap: fatal: lmdb:/var/spool/postfix/plesk/virtual: internal error: mdb.c:2156: Assertion 'rc == 0' failed in mdb_page_dirty()
)
+++ exited with 1 +++

Curiously, I can't seem to reproduce this mdb_page_dirty assertion failure by running either; /usr/sbin/postmap or /usr/sbin/postmap -s, so there must be something more required.

However, I found a way to finally reproduce the error!

First, Export the postmap file:

Code:
cd /var/spool/postfix/plesk/
postmap -s lmdb:virtual > virtual
nano virtual

## Add an entry for a new domain manually.
# Add entries for postmaster, mailer-daemon, root, drweb

postmap lmdb:virtual

postmap: fatal: lmdb:virtual: internal error: mdb.c:2156: Assertion 'rc == 0' failed in mdb_page_dirty()

I think postmap has some trouble handling lmdb above a certain size!
 
After installing some debug packages and doing some more digging, I've managed to reproduce a readable backtrace;

Code:
Breakpoint 2.14, 0x00007ffff7e2d920 in write () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7e2d920 in write () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7f5000d in timed_write (fd=2, buf=0x7ffff7f6f900 <vstream_fstd_buf>, len=105, timeout=0, unused_context=<optimized out>) at /build/reproducible-path/postfix-3.10.5/src/util/timed_write.c:81
#2  0x00007ffff7f4cdf0 in vstream_fflush_some (stream=0x7ffff7f6f540 <vstream_fstd+672>, to_flush=105) at /build/reproducible-path/postfix-3.10.5/src/util/vstream.c:811
#3  0x00007ffff7f3e8a2 in msg_vprintf (level=level@entry=3, format=0x7ffff57d6000 "%s:%s: internal error: %s", ap=ap@entry=0x7fffffffda70) at /build/reproducible-path/postfix-3.10.5/src/util/msg_output.c:170
#4  0x00007ffff7f3fdf0 in vmsg_fatal (fmt=<optimized out>, ap=ap@entry=0x7fffffffda70) at /build/reproducible-path/postfix-3.10.5/src/util/msg.c:261
#5  0x00007ffff7f3fe9f in msg_fatal (fmt=fmt@entry=0x7ffff57d6000 "%s:%s: internal error: %s") at /build/reproducible-path/postfix-3.10.5/src/util/msg.c:254
#6  0x00007ffff57d34f2 in dict_lmdb_assert (context=<optimized out>, text=<optimized out>) at /build/reproducible-path/postfix-3.10.5/src/util/dict_lmdb.c:539
#7  0x00007ffff57bde20 in mdb_assert_fail (env=0x55555557b300, expr_txt=expr_txt@entry=0x7ffff57cc01f "rc == 0", func=func@entry=0x7ffff57cc978 <__func__.15> "mdb_page_dirty", line=line@entry=2156, file=0x7ffff57cc000 "mdb.c")
    at ./libraries/liblmdb/mdb.c:1569
#8  0x00007ffff57bdea2 in mdb_page_dirty (txn=<optimized out>, mp=<optimized out>) at ./libraries/liblmdb/mdb.c:2143
#9  mdb_page_dirty (txn=0x55555557b460, mp=<optimized out>) at ./libraries/liblmdb/mdb.c:2143
#10 0x00007ffff57c2e25 in mdb_page_alloc (num=num@entry=1, mp=mp@entry=0x7fffffffe030, mc=<optimized out>) at ./libraries/liblmdb/mdb.c:2337
#11 0x00007ffff57c5726 in mdb_page_new (mc=0x7fffffffe4e0, flags=1, num=1, mp=<synthetic pointer>) at ./libraries/liblmdb/mdb.c:7223
#12 mdb_page_split (mc=mc@entry=0x7fffffffe4e0, newkey=newkey@entry=0x7fffffffe900, newdata=0x7fffffffe8f0, newpgno=newpgno@entry=18446744073709551615, nflags=nflags@entry=0) at ./libraries/liblmdb/mdb.c:8679
#13 0x00007ffff57c8424 in mdb_cursor_put (mc=mc@entry=0x7fffffffe4e0, key=key@entry=0x7fffffffe900, data=data@entry=0x7fffffffe8f0, flags=flags@entry=16) at ./libraries/liblmdb/mdb.c:6987
#14 0x00007ffff57cae77 in mdb_put (txn=0x55555557b460, dbi=1, key=key@entry=0x7fffffffe900, data=data@entry=0x7fffffffe8f0, flags=flags@entry=16) at ./libraries/liblmdb/mdb.c:9079
#15 0x00007ffff57d3dde in slmdb_put (slmdb=slmdb@entry=0x55555557dca8, mdb_key=mdb_key@entry=0x7fffffffe900, mdb_value=mdb_value@entry=0x7fffffffe8f0, flags=16) at /build/reproducible-path/postfix-3.10.5/src/util/slmdb.c:612
#16 0x00007ffff57d3fb0 in dict_lmdb_update (dict=0x55555557dc10, name=0x55555557fe80 "[email protected]", value=0x5555555785d0 "[email protected]")
    at /build/reproducible-path/postfix-3.10.5/src/util/dict_lmdb.c:278
#17 0x00005555555572b6 in postmap (map_type=<optimized out>, path_name=<optimized out>, postmap_flags=<optimized out>, open_flags=<optimized out>, dict_flags=671745) at ./src/postmap/postmap.c:569
#18 0x0000555555556c51 in main (argc=2, argv=0x7fffffffec88) at ./src/postmap/postmap.c:1166
 
As far as fixing this bug is concerned: I would suggest reverting to an older postfix db type such as hash or berkeley db rather than relying on this newfangled buggy lmdb thing, or even moving to SQL. Perhaps lmdb isn't buggy but postconf is doing_it_wrong in the way it uses the thing, or perhaps plesk is mis-formatting the input and postconf is handling it badly (infinite loops and assert failures rather than useful error messages), but either way, whichever program has the bug, that code path is currently evidently unreliable.
 
Alternate (slow) workaround: Do not migrate more than 10 subscriptions at once. After each 10, run `plesk repair mail -y` on the command line. Still goes wrong a bunch of the time, so delete and re-try failing subscriptions. It might take a few days instead of a few hours, but eventually you get to have migrated.

Possible way plesk code can cause the problem: Deleting the lock file. KDE has had a similar bug: see 389848 – baloo_file crashes in mdb_put() in LMDB. The cause appears to be deleting the lock file for an lmdb database.

Does plesk touch or remove the file "/var/spool/postfix/plesk/virtual.lmdb-lock" directly in any way? Then that may lead to unexpected corruption because lmdb uses that file to ensure ACID compliance. Without it, race conditions that can lead to silent data corruption may occur. The fact that

plesk repair mail -y

Fixes the issue (allows one to migrate another 10-ish domains, and the fact that domains are migrated 10 at a time, plus that the issues occur semi-randomly (the first batch I could migrate 200-ish domains before it cropped up, see the OP), all point in this direction, so if looking for a cause, that would be my educated guess as of right now.

So I turned on inotify and tried to see if there were any DELETEs of the lock file, but could not see any. While it's possible plesk is manually touching it/editing it, I don't consider that very likely (rather than deleting it). I see a DELETE of the file /var/spool/postfix/plesk/virtual a bunch of times.
 
Back
Top