• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Resolved Dovecot freeze the System

jonB.

New Pleskian
Hello everybody,

after I update my servers (hypervisor and VM where plesk runs), our webserver did not work anymore. We use ubuntu 16.04 as VM host and in the VM. The VM where our web services runs, freeze all the time with 100% CPU usage.

I manage to get the VM half working, by mounting the VM image and delete the dovecot systemd start script under:

/etc/systemd/system/multi-user.target.wants/

But when I start dovecot again by hand the system freezes directly. I can not even see a change in htop, only my VM management window monitor the high CPU usage. I also get no useful information in the log files (syslog, kern.log, mail.err).

I also recover a backup from last night, but this did not work to.

I even though update to the current plesk version 17.8.11 could help, but sadly not.

Have you any idea what is happen here, and what I can do?

Edit: I also tested different repair options from plesk cli, like:
# plesk repair installation
# plesk installer --select-release-current --reinstall-patch --upgrade-installed-components

Or remove and reinstall dovecot. The installation process hangs then to, when it starts the service.
 
Last edited:
Hi, jonB.!

Are there any messages about corrupted index cache file, multiple waiting for auth or any timeout messages in dovecot mail logs?

What is happening if you try to manual connect via:
Code:
telnet localhost 143
?
 
Hi Aleksandr,
thank you for your replay! telnet connections are not possible at all, because the server did not response any more. But I get this messages:

mail.err:
Code:
Apr  9 10:16:07 www dmarc[1920]: DMARC: PASS message for [email protected]
Apr  9 10:16:07 www dk_check[1921]: DKIM verify result: DKIM verification (d=example.org, 1024-bit key) succeeded
Apr  9 10:16:07 www dovecot: service=lda, [email protected], ip=[]. Error: userdb lookup: connect(/var/run/dovecot/auth-userdb) failed: No such file or directory
Apr  9 10:16:07 www dovecot: lda: Fatal: Internal error occurred. Refer to server log for more information.
Apr  9 10:17:57 www spf[2624]: Error code: (2) Could not find a valid SPF record
Apr  9 10:17:57 www spf[2624]: Failed to query MAIL-FROM: No DNS data for 'tendenzen.de'.
Apr  9 10:17:58 www dmarc[2640]: DKIM record was not found in Authentication-Results:
Apr  9 10:17:58 www dmarc[2640]: DMARC: PASS message for [email protected]
Apr  9 10:17:58 www dk_check[2643]: DKIM verify result: Message is not signed
Apr  9 10:17:58 www dovecot: service=lda, [email protected], ip=[]. Error: userdb lookup: connect(/var/run/dovecot/auth-userdb) failed: No such file or directory
Apr  9 10:17:58 www dovecot: lda: Fatal: Internal error occurred. Refer to server log for more information.
Apr  9 10:21:30 www dmarc[2446]: DKIM record was not found in Authentication-Results:
Apr  9 10:21:30 www dmarc[2446]: DMARC: PASS message for [email protected]
Apr  9 10:21:30 www dk_check[2447]: DKIM verify result: DKIM verification (d=example.org, 1024-bit key) succeeded
Apr  9 10:21:31 www dovecot: service=lda, [email protected], ip=[]. Error: userdb lookup: connect(/var/run/dovecot/auth-userdb) failed: No such file or directory
Apr  9 10:21:31 www dovecot: lda: Fatal: Internal error occurred. Refer to server log for more information.

...


Apr  9 11:14:46 www plesk_saslauthd[8634]: Failed to initialize password cipher context
Apr  9 11:16:31 www dovecot: service=lda, [email protected], ip=[]. Error: userdb lookup: connect(/var/run/dovecot/auth-userdb) failed: No such file or directory
Apr  9 11:16:31 www dovecot: lda: Fatal: Internal error occurred. Refer to server log for more information.
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Apr  9 11:21:06 www dovecot:
service=lda, [email protected], ip=[]. Error: userdb lookup: connect(/var/run/dovecot/auth-userdb) failed: No such file or directory

I think the missing auth-userdb comes only because dovecot was off in that time.

In the meantime I had install a new server and migrate all the data, so 85% is running again, but I really would like to know what is happen, to have a solution when this is happen again.

What wonder me is, what when is really just this db-file why the it have work before, and why even my full backed up VM had the same problem?

 
Last edited:
Hi Aleksandr,
thank you for your replay! telnet connections are not possible at all, because the server did not response any more. But I get this messages:

mail.err:
Code:
Apr  9 10:16:07 www dmarc[1920]: DMARC: PASS message for [email protected]
Apr  9 10:16:07 www dk_check[1921]: DKIM verify result: DKIM verification (d=example.org, 1024-bit key) succeeded
Apr  9 10:16:07 www dovecot: service=lda, [email protected], ip=[]. Error: userdb lookup: connect(/var/run/dovecot/auth-userdb) failed: No such file or directory
Apr  9 10:16:07 www dovecot: lda: Fatal: Internal error occurred. Refer to server log for more information.
Apr  9 10:17:57 www spf[2624]: Error code: (2) Could not find a valid SPF record
Apr  9 10:17:57 www spf[2624]: Failed to query MAIL-FROM: No DNS data for 'tendenzen.de'.
Apr  9 10:17:58 www dmarc[2640]: DKIM record was not found in Authentication-Results:
Apr  9 10:17:58 www dmarc[2640]: DMARC: PASS message for [email protected]
Apr  9 10:17:58 www dk_check[2643]: DKIM verify result: Message is not signed
Apr  9 10:17:58 www dovecot: service=lda, [email protected], ip=[]. Error: userdb lookup: connect(/var/run/dovecot/auth-userdb) failed: No such file or directory
Apr  9 10:17:58 www dovecot: lda: Fatal: Internal error occurred. Refer to server log for more information.
Apr  9 10:21:30 www dmarc[2446]: DKIM record was not found in Authentication-Results:
Apr  9 10:21:30 www dmarc[2446]: DMARC: PASS message for [email protected]
Apr  9 10:21:30 www dk_check[2447]: DKIM verify result: DKIM verification (d=example.org, 1024-bit key) succeeded
Apr  9 10:21:31 www dovecot: service=lda, [email protected], ip=[]. Error: userdb lookup: connect(/var/run/dovecot/auth-userdb) failed: No such file or directory
Apr  9 10:21:31 www dovecot: lda: Fatal: Internal error occurred. Refer to server log for more information.

...


Apr  9 11:14:46 www plesk_saslauthd[8634]: Failed to initialize password cipher context
Apr  9 11:16:31 www dovecot: service=lda, [email protected], ip=[]. Error: userdb lookup: connect(/var/run/dovecot/auth-userdb) failed: No such file or directory
Apr  9 11:16:31 www dovecot: lda: Fatal: Internal error occurred. Refer to server log for more information.
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Apr  9 11:21:06 www dovecot:
service=lda, [email protected], ip=[]. Error: userdb lookup: connect(/var/run/dovecot/auth-userdb) failed: No such file or directory

I think the missing auth-userdb comes only because dovecot was off in that time.
Socket file /var/run/dovecot/auth-userdb rewrites on start dovecot.

In the meantime I had install a new server and migrate all the data, so 85% is running again, but I really would like to know what is happen, to have a solution when this is happen again.

What wonder me is, what when is really just this db-file why the it have work before, and why even my full backed up VM had the same problem?
> after I update my servers (hypervisor and VM where plesk runs), our webserver did not work anymore

It is multidirectional question and there is no one correct answer. Looks like there is heisenbug in dovecot or in hypervisor.
I can advice to you:
- Check you virtualization settings in bios on hardware host.
- Test this mail configuration on VM from other vendor/other hypervisor type. Or even test it on native hardware node. If need, try to get help from VM vendor.
- If it possibe, then rollback hypervisor update and test mail configuration.

> even my full backed up VM had the same problem?
Among other things it could be issue in file system integrity inside VM which was backed up, or in file of VM image. If I were you I would have checked this too.
 
Sorry for my very late delay! I was not really able to figure out what was happen. I migrate the old server to a new debian VM. But the strange thing was, that a day later the old VM runs smoothly without any issues....
 
Back
Top