Issue Roundcube stops working intermittently

PatrickvanLier · Oct 11, 2020

short story
Roundcube repeats to stop working after a while. Not sure if the problem lies with IMAP or roundcube itself. restarting apache2 seems to be working workaround.

Long story
I have a Plesk Obsidian 18.0.30 update 2 running on Ubuntu 18.04.5. LTS. about a year ago I migrated from a previous plesk server and all seemed wonderfull.

Out of the blue some customers started complaining that webmail was down. I have RoundCube installed. I found out that if I disabled and re-enabled the webapplication firewall then Roundcube would start working again. I figured it was a bad ruleset update. I hoped it was a one-time-error and let it be.

After a while I got the same problem. Searching the forum I added an exception for the location "/roundcube/" and added a line "SecResponseBodyLimit 546870912", I believe because of an other error but it could have been a suggested solution for this error. Not sure anymore.

When it happened again I turned of WAF completely because my customer were getting very annoyed because they rely on Roundcube.

After a while Roundcube still stopped working. I dove into several logfiles and found some errors that pointed to fpm-cgi. After I issued the command 'service apache2 restart', Roundcube started working again.

Mail keeps working as far as I know. This is a fairly small server with about 20 active domains so it should be able to do it's job easily. It is a VPS hosted at strato.

I was hit with Corona so it has been a fix-and-run situatiion the last time. If just tried to find the errors again in the logfiles. I believe it had somethins to do with fpm but I cannot find those anymore. I did find the following errors in the log /var/log/mail.log.1 from the previous time roundcube stopped working:

Oct 4 00:25:16 h2856220 dovecot: auth: Error: sh: 1: Cannot fork
Oct 4 00:25:16 h2856220 dovecot_authdb_plesk[30299]: Unable to determine mail server type from ``mailmng-server --features''
Oct 4 00:25:16 h2856220 dovecot: auth: Fatal: authdb plesk: initialization failed - unable to detect current mail authentication DB
...
Oct 4 00:33:15 h2856220 dovecot: auth: Error: sh: 1: Cannot fork
Oct 4 00:33:15 h2856220 dovecot_authdb_plesk[30528]: Unable to determine mail server type from ``mailmng-server --features''
Oct 4 00:33:15 h2856220 dovecot: auth: Fatal: authdb plesk: initialization failed - unable to detect current mail authentication DB
Oct 4 00:33:15 h2856220 dovecot: master: Error: service(auth): command startup failed, throttling for 60.000 secs
Oct 4 00:33:15 h2856220 dovecot: imap-login: Disconnected: Auth process broken (disconnected before auth was ready, waited 0 secs): user=<>, rip=X.X.X.X, lip=X.X.X.X, TLS handshaking, session=<DTvW1cuwJ+2Ps1ta>
Oct 4 00:33:27 h2856220 dovecot: imap-login: Warning: Auth process not responding, delayed sending initial response (greeting): user=<>, rip=X.X.X.X, lip=X.X.X.X, TLS, session=<DKaD1suwKO2Ps1ta>
Oct 4 00:33:47 h2856220 dovecot: imap-login: Error: auth-client: conn unix:login (pid=24065,uid=0): Timeout waiting for handshake from auth server. my pid=30531, input bytes=0
Oct 4 00:33:47 h2856220 dovecot: imap-login: Disconnected: Auth process broken (disconnected before auth was ready, waited 30 secs): user=<>, rip=X.X.X.X, lip=X.X.X.X, TLS, session=<DKaD1suwKO2Ps1ta>
Oct 4 00:33:57 h2856220 dovecot: imap-login: Warning: Auth process not responding, delayed sending initial response (greeting): user=<>, rip=X.X.X.X, lip=X.X.X.X, TLS, session=<D7lQ2MuwKe2Ps1ta>
...
Oct 4 15:57:53 h2856220 dovecot: master: Error: service(imap): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:57:53 h2856220 dovecot: master: Error: service(imap): command startup failed, throttling for 2.000 secs
Oct 4 15:57:55 h2856220 dovecot: master: Error: service(imap): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:57:55 h2856220 dovecot: master: Error: service(imap): command startup failed, throttling for 4.000 secs
Oct 4 15:57:59 h2856220 dovecot: master: Error: service(imap): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:57:59 h2856220 dovecot: master: Error: service(imap): command startup failed, throttling for 8.000 secs
Oct 4 15:58:07 h2856220 dovecot: master: Error: service(imap): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:58:07 h2856220 dovecot: master: Error: service(imap): command startup failed, throttling for 16.000 secs
Oct 4 15:58:22 h2856220 dovecot: master: Error: service(imap-login): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:58:22 h2856220 dovecot: master: Error: service(imap-login): command startup failed, throttling for 2.000 secs
Oct 4 15:58:23 h2856220 dovecot: master: Error: service(imap): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:58:23 h2856220 dovecot: master: Error: service(imap): command startup failed, throttling for 32.000 secs
Oct 4 15:58:24 h2856220 dovecot: master: Error: service(imap-login): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:58:24 h2856220 dovecot: master: Error: service(imap-login): command startup failed, throttling for 4.000 secs
Oct 4 15:58:28 h2856220 dovecot: master: Error: service(imap-login): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:58:28 h2856220 dovecot: master: Error: service(imap-login): command startup failed, throttling for 8.000 secs
Oct 4 15:58:36 h2856220 dovecot: master: Error: service(imap-login): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:58:36 h2856220 dovecot: master: Error: service(imap-login): command startup failed, throttling for 16.000 secs
Oct 4 15:58:52 h2856220 dovecot: master: Error: service(imap-login): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:58:52 h2856220 dovecot: master: Error: service(imap-login): command startup failed, throttling for 32.000 secs
Oct 4 15:58:55 h2856220 dovecot: master: Error: service(imap): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 4 15:58:55 h2856220 dovecot: master: Error: service(imap): command startup failed, throttling for 60.000 secs
...

IgorG · Oct 11, 2020

Issue - Dovecot IMAP Mail error - No SSL-Connection over Port 993

Hello, I an error with my dovecot mailserver on Plesk. Even if I want to load the Emails over the Apple Mail or Outlook Client, I'm getting the following error: "Mail could not establish an SSL connection to the server "example.tld" via the standard ports. Make sure that this server supports...

talk.plesk.com

PatrickvanLier · Oct 12, 2020

I already saw that post yes but the strange thing is mail keeps working as far as I know (only got complaints from the roundcube users, maybe the others haven't noticed or are all using pop3

But I've sent a message to Strato support and also set up some monitoring for roundcube loginpage and imap login. Let's see what happens and I'll report back.

PatrickvanLier · Oct 12, 2020

No reply yet from Strato (makes me wonder if the low price vs support is still ok for me) and just had the problem again, although is seems to have fixed itself for now. Anyway, sharing my findings, maybe it has another reason and I don't want to wait for nothing:

My monitoring software just told me imap was down. I could see the logingpage of Roundcube but there was an error 'connecting to storageserver' or something like that. When I look at /var/log/mailllog these are some of the errors:

Oct 12 19:53:37 h2856220 dovecot: master: Error: service(imap): fork() failed: Resource temporarily unavailable (ulimit -u 62987 reached?)
Oct 12 19:53:37 h2856220 dovecot: master: Error: service(imap): command startup failed, throttling for 2.000 secs
....
Oct 12 19:55:07 h2856220 dovecot: imap-login: Error: master(imap): Auth request timed out (client-pid=23113, client-id=1, rip=x.x.x.x created 90089 msecs ago, received 0/4 bytes)
Oct 12 19:55:07 h2856220 dovecot: imap-login: Internal login failure (pid=23113 id=1): user=<[email protected]>, method=PLAIN, rip=x.x.x.x, lip=85.214.207.70, TLS, session=<EWtN+nyxgPxNoW9E>
...
Oct 12 19:57:23 h2856220 dovecot: auth: Error: Fatal error: plesk::ExSystemError<11>(forkExecvPipes: fork() failed: Resource temporarily unavailable)
...
Oct 12 19:58:07 h2856220 dovecot: auth: Error: sh: 1: Cannot fork
Oct 12 19:58:07 h2856220 dovecot_authdb_plesk[23272]: Unable to determine mail server type from ``mailmng-server --features''
Oct 12 19:58:07 h2856220 dovecot: auth: Fatal: authdb plesk: initialization failed - unable to detect current mail authentication DB
...
Oct 12 19:58:39 h2856220 dovecot: imap-login: Error: auth-client: conn unix:login (pid=309,uid=0): Timeout waiting for handshake from auth server. my pid=23289, input bytes=0

I'm seeing mail.error, maillog, mail.log, mail.log.1 etc. not sure what the differences are but the errors in these files seem alike.

Also worth noticing: some previous time roundcube login page would not load, this time it did, so I've got 2 different problem situaties:
1) roundcube not loading (that's why I reconfigured and later turned of WAF)
2) imap not working (that's when the fork-errors show)

Anywhere else I need to check when it happens again?

-- edit --

cat /proc/user_beancounters gives a table with all failcnt 0, ulimit -n gives 1024

PatrickvanLier · Oct 13, 2020

And now the roundcube problem is back. /var/log/apache2 is full of the following:

[Tue Oct 13 10:05:13.118533 2020] [fcgid:error] [pid 7993:tid 140139477298112] (11)Resource temporarily unavailable: mod_fcgid: can't run /var/www/cgi-bin/cgi_wrapper/cgi_wrapper
[Tue Oct 13 10:05:13.119321 2020] [fcgid:warn] [pid 7993:tid 140139477298112] (11)Resource temporarily unavailable: mod_fcgid: spawn process /var/www/cgi-bin/cgi_wrapper/cgi_wrapper error

cat /proc/user_beancounters failcnt still all 0, ps- aux for dovecot gives 50

monitoring is giving me repeating notifications 'down' and 'up' so it's indeed banging against a limit it seems.... just did a 'service apache2 restart' and no notifications yet...

PatrickvanLier · Oct 16, 2020

so, after waiting several days(!) I got a short reply that I am root on my machine and should be able to change settings myself and that they don't have access to my installation. ticket closed....

Is there a way for me to determine if the problem really is at their end? Can I see some evidence of limits reached? I want to reply them but probably have to wait a few days again so if there is any proof I can supply it's at their end might save me from waiting more days for a reply in which they repeat it isn't a problem at their end....

Bitpalast · Oct 17, 2020

I think the key here is this line:
Oct 4 00:33:15 h2856220 dovecot: auth: Error: sh: 1: Cannot fork
It means that your resources are insufficient to handle the request. The next step to figure out which resources is to check:
- RAM usage
- CPU usage
- files and inode usage, especially the number of open files against the number of maximum allowed open files
This is nothing that your provider does for you, you'll need to do these checks yourself if it is a root server.

I think the
# grep -v " 0$" /proc/user_beancounters
# ps -aux | grep dovecot | grep -v grep | wc -l
# ulimit -n
that @Arashi mentioned in his post are a great way to start.

Also check the content of /etc/sysctl.conf, /etc/security/limits.conf and /etc/xinetd.conf and check with the actual system use if you hit any of the limits set there.

PatrickvanLier · Oct 20, 2020

I'm having the Roundcube 503 error again. The output of the requested items:

grep -v " 0$" /proc/user_beancounters

>> No output, this is the complete listing:

Version: 2.5

uid resource held maxheld barrier limit failcnt

2856220: kmemsize 376369152 501702656 9223372036854775807 9223372036854775807 0

lockedpages 0 32 9223372036854775807 9223372036854775807 0

privvmpages 652636 918348 9223372036854775807 9223372036854775807 0

shmpages 99163 230235 9223372036854775807 9223372036854775807 0

dummy 0 0 9223372036854775807 9223372036854775807 0

numproc 198 198 400 400 0

physpages 586174 1021175 1048576 1048576 0

vmguarpages 0 0 9223372036854775807 9223372036854775807 0

oomguarpages 637493 1048576 0 0 0

numtcpsock 0 0 9223372036854775807 9223372036854775807 0

numflock 0 0 9223372036854775807 9223372036854775807 0

numpty 1 2 9223372036854775807 9223372036854775807 0

numsiginfo 0 72 9223372036854775807 9223372036854775807 0

tcpsndbuf 0 0 9223372036854775807 9223372036854775807 0

tcprcvbuf 0 0 9223372036854775807 9223372036854775807 0

othersockbuf 0 0 9223372036854775807 9223372036854775807 0

dgramrcvbuf 0 0 9223372036854775807 9223372036854775807 0

numothersock 0 0 9223372036854775807 9223372036854775807 0

dcachesize 336404480 453439488 9223372036854775807 9223372036854775807 0

numfile 4080 6090 9223372036854775807 9223372036854775807 0

dummy 0 0 9223372036854775807 9223372036854775807 0

ps -aux | grep dovecot | grep -v grep | wc -l

>> The output is: 14

ulimit -n

>> as always it's output is: 1024

/etc/sysctl.conf and /etc/security/limits.conf

>> All lines start with # and therefor are comments, so no active config in these files

/etc/xinetd.conf

>> there is only 1 line which includes al files in /etc/xinetd.d. All files there don't seem to imply a limit. There are these files:

root@h2856220:/etc/xinetd.d# ls

chargen daytime discard echo ftp_psa servers time

chargen-udp daytime-udp discard-udp echo-udp poppassd_psa services time-udp

---

The file /var/log/apache2/error.log is full of these lines:
[fcgid:warn] [pid 3691:tid 140518811655104] (11)Resource temporarily unavailable: mod_fcgid: spawn process /var/www/cgi-bin/cgi_wrapper/cgi_wrapper error

Memory is as follows:

root@h2856220:/var/log/apache2# free -m
total used free shared buff/cache available
Mem: 4096 687 1650 93 1758 3315
Swap: 0 0 0

Plesk GUI is still working but also slow. I thought maybe fail2ban was taking up to much resources because of attacks and large logfiles but stopping the service didn't make webmail run again. Also tried restarting PHP-FPM to see if that helped but no go. According to the GUI -> Processlist I have 2.35% cpu and 25.5% mem in use. Disk is 0kb/s.

Opening GUI -> Tools and services -> Firewall take a long while but eventually loads when pressing reload a few times. Disabling doesn't bring webmail back to life so I turned it on again

I saw an old try to fix this in the config of the web-app-firewall. Under custom directives a had <location /roundcube>Secengine off</location> or something like that. I removed it and now I only have 1 line left that also was put there because of an earlier problem: SecResponseBodyLimit 546870912

After changing the settings of the web-app-firewall I guess apache2 service was restarted because now roundcube is working again... Waiting for the next crash..

I'm no linux guru, I got some experience by starting to run a server and get info from the internet... It's just a small server to help a couple of friends that work for themselves to have some cheap option for hosting, but this is getting them irritated and I would hate to lose their trust and business... So please be patient with me and keep helping me to find the source of these problems! Thank you!

PatrickvanLier · Jan 9, 2021

Hey guys, happy new year and have a healthy one everyone!

The problem still exists, no fix so far. Anyone with good ideas? If not, it this something that can be adressed by Plesk support as it doesn't seem to be a user problem?

Bitpalast · Jan 9, 2021

Sure it's a user problem.

Have you tried to expand your resources. For example

Websites and ActiveSync services hosted in Plesk are loading slowly or fail to load with a 50x error: mod_fcgid: can't apply process slot for /var/www/cgi-bin/cgi_wrapper/cgi_wrapper

Applicable to: Plesk for Linux Symptoms Websites are loading very slow or loading continuously and eventually fail with a 50x error in a web-browser: 500 Internal Server 502 Bad Gateway 503 ...

support.plesk.com

PatrickvanLier · Feb 2, 2021

Hello Peter, I missed your response it seems. my apologies. The server has 4gb of memory, only 15-20 sites are live so that should be enough I think. I adjusted the values to:

FcgidMaxProcesses 75
FcgidMaxProcessesPerClass 40

As stated in the comment one should take 10 per gb of mem available. Do these values sound good to you? After restart of apache2, Server information -> memory usage -> memory usage is constant around 16%, cpu usage last 1 minute 0,25, last 5 minutes 0.51 and last 15 minutes 1.29

GabrielleC · May 10, 2021

Why is this error still an issue and why is PLESK not interested in fixing it , I used webmail for my customers and now this is not working , and all fixes here do not work

Bitpalast · May 10, 2021

What fix do you suggest? The software is o.k., if the user does not provide enough resources, how can Plesk fix it?

Issue Roundcube stops working intermittently

PatrickvanLier

New Pleskian

IgorG

Plesk addicted!

Issue - Dovecot IMAP Mail error - No SSL-Connection over Port 993

PatrickvanLier

New Pleskian

PatrickvanLier

New Pleskian

PatrickvanLier

New Pleskian

PatrickvanLier

New Pleskian

Bitpalast

Plesk addicted!

PatrickvanLier

New Pleskian

PatrickvanLier

New Pleskian

Bitpalast

Plesk addicted!

Websites and ActiveSync services hosted in Plesk are loading slowly or fail to load with a 50x error: mod_fcgid: can't apply process slot for /var/www/cgi-bin/cgi_wrapper/cgi_wrapper

PatrickvanLier

New Pleskian

GabrielleC

New Pleskian

Bitpalast

Plesk addicted!

Similar threads