• Please be aware: Kaspersky Anti-Virus has been deprecated
    With the upgrade to Plesk Obsidian 18.0.64, "Kaspersky Anti-Virus for Servers" will be automatically removed from the servers it is installed on. We recommend that you migrate to Sophos Anti-Virus for Servers.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Issue Random Connection Timeouts on all Websites

MerlinWoff

New Pleskian
Hello,

We are using CentOS 7.9 Plesk Obsidian (Version 18.0.34, newest) together with 2 uptime monitoring services (StatusCake, Uptimerobot). Since some days, all websites (~ 60 domains) we are monitoring experience all at the same time random downtimes of some seconds every 5-15 minutes (resulting in a Connection Timeout).

After restarting the httpd Service, this strange behaviour disappears for 8-48 hours and then starts to appear again.

Debugging:

I already consulted the forum and tried the following things:
- re-enable / reconfigure-all nginx
- the public IP of the Plesk is not blacklisted in fail2ban
- the server resources are all utilized up to at most 65% (RAM, HDD, CPUs, Network)
- the WAF configuration was not changed within the last half year
- Diagnose & Repair tools does not find any issues

Installing a cronjob for restarting httpd service every 2 hours does not seem like a good solution to us.


Has someone any debugging techniques or a solution?

Thank you very much!
 
Does anyone has an idea by what this problem could be caused?

The nginx shows an 502 nginx error on screen and in the logs contained:
plesk connect() failed (111: Connection refused) while connecting to upstream
 
Last edited:
Check Apache, and PHP-FPM logs if thats being used.

Is it just dynamic, PHP files? Or all content

Ideally, you can configure monit or a similar service (even a cron job ) to check the status of an apache served page (ie, if ! curl -I <URL> | grep -q "200 OK"; then service httpd restart; fi)
 
Check Apache, and PHP-FPM logs if thats being used.

Is it just dynamic, PHP files? Or all content

Ideally, you can configure monit or a similar service (even a cron job ) to check the status of an apache served page (ie, if ! curl -I <URL> | grep -q "200 OK"; then service httpd restart; fi)

Thank you very much for your answer. I am sorry that I write only now, but we had no problems for several days and I could not test your thesis. You were right, static files can be loaded instantly but only php scripts lead to a connection timeout. Restarting nginx does not work or speedup the loading. As soon as we restart httpd, everything works again.

The log file /var/log/php-fpm/error.log is empty but the other log file /var/log/plesk-php73-fpm/error.log contains only random php errors, related to bad coded websites.

Is your approach with setting up a monitoring to automatically restart plesk a nice workaround?

Thank you very much for your help.

Best
Merlin
 
Then, have you checked httpd error logs?
I checked /var/log/httpd/error_log and found the following errors:

a lot of:
[Tue Mar 23 15:19:09.446591 2021] [mpm_event:error] [pid 25999:tid 140642862286976] AH00485: scoreboard is full, not at MaxRequestWorkers

a lot of:
[Tue Mar 23 19:46:21.006493 2021] [core:warn] [pid 25999:tid 140642862286976] AH00045: child process 25727 still did not exit, sending a SIGTERM

some:
[Tue Mar 23 15:19:05.044862 2021] [lbmethod_heartbeat:notice] [pid 25999:tid 140642862286976] AH02282: No slotmem from mod_heartmonitor


and during the restart:
[ N 2021-03-23 19:46:27.3169 12196/T8 age/Cor/CoreMain.cpp:671 ]: Signal received. Gracefully shutting down... (send signal 2 more time(s) to force shutdown)
[ N 2021-03-23 19:46:27.3170 12196/T1 age/Cor/CoreMain.cpp:1246 ]: Received command to shutdown gracefully. Waiting until all clients have disconnected...
[ N 2021-03-23 19:46:27.3172 12196/T8 Ser/Server.h:902 ]: [ServerThr.1] Freed 0 spare client objects
[ N 2021-03-23 19:46:27.3172 12196/Ta Ser/Server.h:902 ]: [ServerThr.2] Freed 0 spare client objects
[ N 2021-03-23 19:46:27.3172 12196/T8 Ser/Server.h:558 ]: [ServerThr.1] Shutdown finished
[ N 2021-03-23 19:46:27.3172 12196/Ta Ser/Server.h:558 ]: [ServerThr.2] Shutdown finished
[ N 2021-03-23 19:46:27.3172 12196/Ti Ser/Server.h:902 ]: [ServerThr.6] Freed 0 spare client objects
[ N 2021-03-23 19:46:27.3173 12196/Ti Ser/Server.h:558 ]: [ServerThr.6] Shutdown finished
[ N 2021-03-23 19:46:27.3173 12196/Te Ser/Server.h:902 ]: [ServerThr.4] Freed 0 spare client objects
[ N 2021-03-23 19:46:27.3173 12196/Te Ser/Server.h:558 ]: [ServerThr.4] Shutdown finished
[ N 2021-03-23 19:46:27.3173 12196/Tk Ser/Server.h:902 ]: [ServerThr.7] Freed 0 spare client objects
[ N 2021-03-23 19:46:27.3173 12196/Tk Ser/Server.h:558 ]: [ServerThr.7] Shutdown finished
[ N 2021-03-23 19:46:27.3173 12196/Tm Ser/Server.h:902 ]: [ServerThr.8] Freed 0 spare client objects
[ N 2021-03-23 19:46:27.3173 12196/Tm Ser/Server.h:558 ]: [ServerThr.8] Shutdown finished
[ N 2021-03-23 19:46:27.3174 12196/Tg Ser/Server.h:902 ]: [ServerThr.5] Freed 0 spare client objects
[ N 2021-03-23 19:46:27.3174 12196/Tg Ser/Server.h:558 ]: [ServerThr.5] Shutdown finished
[ N 2021-03-23 19:46:27.3175 12196/Tc Ser/Server.h:902 ]: [ServerThr.3] Freed 0 spare client objects
[ N 2021-03-23 19:46:27.3175 12196/Tc Ser/Server.h:558 ]: [ServerThr.3] Shutdown finished
[ N 2021-03-23 19:46:27.3177 12196/To Ser/Server.h:902 ]: [ApiServer] Freed 0 spare client objects
[ N 2021-03-23 19:46:27.3177 12196/To Ser/Server.h:558 ]: [ApiServer] Shutdown finished
[ N 2021-03-23 19:46:27.5022 12196/T1 age/Cor/CoreMain.cpp:1325 ]: Passenger core shutdown finished
[Tue Mar 23 19:46:28.221457 2021] [core:notice] [pid 24635:tid 139855003469952] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0


could any of these be related to our connection timeout problem?

Thank you for your answer!
 
 Websites periodically are not available in Plesk: scoreboard is full, not at MaxRequestWorkers ++ I suppose you have already configured "graceful apache restart interval + interval e.g. 15 minutes" ( How to enable/disable Apache graceful restart in Plesk - will not solve it but less timeouts), because you mentioned in your first post after a restart it is working up-to 48 hours (may in the 48 hours timeframe no one touched plesk apache based configuration)?

cheers Michael

Hi Michael,
Thank you very much for your post!

We will try now to set the MPM Mode of httpd to prefork instead of event, as it is proposed as a solution in:  Websites periodically are not available in Plesk: scoreboard is full, not at MaxRequestWorkers

Graceful apache restart is no further configured in our case, as the interval is set to 0 (default). Should we change there something?

Thank you very much!

Best,
Merlin
 
Graceful apache restart is no further configured in our case, as the interval is set to 0 (default). Should we change there something?

If you would like to reduce longer HTTP waiting time, my suggestion is to change this to 1800 seconds and activation of graceful restart, apart from your MaxRequestWorkers problem

cheers Bernes
 
If you would like to reduce longer HTTP waiting time, my suggestion is to change this to 1800 seconds and activation of graceful restart, apart from your MaxRequestWorkers problem

cheers Bernes

Hi Bernes
Thank you for your post!

We have raused the HTTP waiting time to 1800 seconds and activated graceful restart.

Since the memory usage of httpd increased very much under the prefork MPM module, we switched back to Event.
In Addition , we have created the following cronjob, since the timeouts were still happening from time to time:
Code:
# Workaround httpd bug:
* * * * * if ! curl --max-time 5 -I 'https://example.com/icon.png' | grep -q "200 OK"; then systemctl restart httpd; fi

We know this solution is very nasty. Does anybody have a better idea?

Our httpd MPM settings are:
Code:
 cat /etc/httpd/conf.modules.d/01-cgi.conf
# This configuration file loads a CGI module appropriate to the MPM
# which has been configured in 00-mpm.conf.  mod_cgid should be used
# with a threaded MPM; mod_cgi with the prefork MPM.

<IfModule mpm_worker_module>
   LoadModule cgid_module modules/mod_cgid.so
</IfModule>
<IfModule mpm_event_module>
   LoadModule cgid_module modules/mod_cgid.so

   # Increase max workers
   MaxRequestWorkers 400
   ServerLimit 16
</IfModule>
<IfModule mpm_prefork_module>
   LoadModule cgi_module modules/mod_cgi.so

   # Increase max workers
   MaxRequestWorkers 400
   ServerLimit 400
</IfModule>

Should we switch back to MPM mode prefork with lower MaxRequestWorkers values?
 
Does anyone has an idea by what this problem could be caused?

The nginx shows an 502 nginx error on screen and in the logs contained:
plesk connect() failed (111: Connection refused) while connecting to upstream
that seems to be a problem not on nginx... but on PHP ( no available process for this petition )
Consider raise PHP childrens too
 
Back
Top