• The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Issue After Onyx -> Obsidian upgrade we've started getting random http 502 errors on multiple servers

burnley

Regular Pleskian
All servers CentOS 7.7.1908 (Core)‬ running Plesk Version 18.0.23 Update #3
How much (and what) has changed in the Plesk logic for restarting / reloading Apache, nginx & php-fpm? Previously, with Onyx, it was all smooth sailing even on the busiest of the servers, very rarely we'd see 502s. But in the last 2-3 days, since the upgrade, we've been bombarded with 502s and we're literally getting dozens of support requests from customers about these 502 errors. More cronjobs running wild? Other changes? Can someone from Plesk chip in? Happy to provide more information if needed.
 
Thanks Igor. Have a look at /var/log/plesk/apache_restart_stat.log pre & post upgrade:
Onyx:
2020-02-03T04:57:55.855Z,6883,"graceful-req",true,6.11954
2020-02-03T07:32:24.209Z,28017,"graceful-req",true,10.4599
2020-02-03T07:59:57.270Z,18756,"graceful-req",true,5.23641
2020-02-03T08:57:54.698Z,3222,"graceful-req",true,9.07493
2020-02-03T15:30:36.677Z,20793,"graceful-req",true,10.3177
2020-02-03T19:40:53.950Z,25415,"graceful-req",true,9.67908
2020-02-03T21:58:12.655Z,8616,"graceful-req",true,8.2042
2020-02-04T01:49:44.257Z,17991,"graceful-req",true,8.52452
2020-02-04T01:54:56.958Z,18741,"graceful-req",true,8.16627
2020-02-04T09:27:04.962Z,2627,"graceful-req",true,9.76969
2020-02-04T09:32:11.763Z,3596,"graceful-req",true,4.03538
2020-02-04T10:15:33.975Z,11881,"graceful-req",true,8.17584
2020-02-04T15:31:31.511Z,30918,"graceful-req",true,11.0243
2020-02-04T23:57:58.483Z,7985,"graceful-req",true,9.13726
2020-02-05T01:48:28.392Z,22396,"graceful-req",true,5.85421
2020-02-05T02:57:59.850Z,28245,"graceful-req",true,7.1734
2020-02-05T03:57:55.411Z,24478,"graceful-req",true,9.15592
2020-02-05T04:41:03.104Z,1993,"graceful-req",true,10.7638

Obsidian:
2020-02-05T10:37:17.232Z,25109,"graceful-req",false,5.46511
2020-02-05T10:38:52.120Z,25109,"stop-start",true,94.8702
2020-02-05T22:25:12.716Z,27675,"graceful-req",false,8.6504
2020-02-05T22:26:47.754Z,27675,"stop-start",true,95.0165
2020-02-06T00:25:06.569Z,27577,"graceful-req",false,11.0113
2020-02-06T00:26:41.570Z,27577,"stop-start",true,94.9788
2020-02-06T01:48:34.879Z,20650,"graceful-req",false,8.62033
2020-02-06T01:50:09.771Z,20650,"stop-start",true,94.8759
2020-02-06T01:55:21.121Z,22426,"graceful-req",false,7.8849
2020-02-06T01:56:56.067Z,22426,"stop-start",true,94.9395
2020-02-06T22:38:35.925Z,18882,"graceful-req",false,10.3857
2020-02-06T22:40:10.957Z,18882,"stop-start",true,95.0126
2020-02-07T01:19:28.342Z,13437,"graceful-req",false,10.9966
2020-02-07T01:21:03.391Z,13437,"stop-start",true,95.0419
2020-02-07T01:26:12.907Z,20158,"graceful-req",false,6.68214
2020-02-07T01:27:47.789Z,20158,"stop-start",true,94.8762
2020-02-07T02:00:53.252Z,26794,"graceful-req",false,6.76953
2020-02-07T02:02:28.309Z,26794,"stop-start",true,95.0501
2020-02-07T02:24:52.122Z,19188,"graceful-req",false,11.0279
2020-02-07T02:26:27.195Z,19188,"stop-start",true,95.0651
2020-02-07T03:25:23.553Z,16484,"graceful-req",false,9.1991
2020-02-07T03:26:58.726Z,16484,"stop-start",true,95.1648

Why do we see stop-start after graceful on Obsidian?
 
@IgorG the problem here are not the 502, but the graceful-req followed by stop-start actions. The 502 & 503 could be the effects.
 
@IgorG I think there are 2 separate issues here, both related to reloading httpd:
1. The graceful-req followed by stop-start actions. We weren't seeing this pattern in Onyx. Is this behaviour normal and expected?
2. The 502 errors on graceful, which appear to be caused by mod_fcgid. Before the Obsidian upgrade we used to run all Apache servers with mpm_event. We've now switched to mpm_prefork in an attempt to improve the stability, which ever since upgrading to Obsidian went downhill.
I see there's been a psa-mod_fcgid package update as part of the upgrade, previous build was psa-mod_fcgid-2.3.9.5-centos7.19061314.x86_64, now running psa-mod_fcgid-2.3.9.5-2.centos.7+p18.0.23.3+t200130.1239.x86_64.
 
@IgorG anyone looking into this issue for us? I've enabled debugging in panel.ini and here's what I see when reconfiguring the php handler on a subscription:

/var/log/plesk/panel.log
[...]
[2020-02-11 14:43:01.969] DEBUG [panel] Service_Driver_Web_Server_Apache::logRestartForStatistics
[2020-02-11 14:43:01.988] DEBUG [util_exec] [a902567b03d442d98dd4fab154b7e854-0] Starting: apache_control_adapter --restart --restart-interval 300 --http-port 7080 --https-port 7081 --graceful
[2020-02-11 14:43:10.348] DEBUG [util_exec] [a902567b03d442d98dd4fab154b7e854-0] Finished in 8.35917s, Error code: 0
[...]

/var/log/plesk/apache_restart_stat.log
[...]
2020-02-11T03:43:08.430Z,2442,"graceful-req",false,5.26458
2020-02-11T03:43:10.347Z,2442,"stop-start",true,1.91696
[...]

/var/log/httpd/error_log
[...]
[Tue Feb 11 14:43:02.645363 2020] [mpm_prefork:emerg] [pid 1864] (43)Identifier removed: AH00144: couldn't grab the accept mutex
[Tue Feb 11 14:43:02.645364 2020] [mpm_prefork:emerg] [pid 1863] (43)Identifier removed: AH00144: couldn't grab the accept mutex
[Tue Feb 11 14:43:02.649094 2020] [mpm_prefork:emerg] [pid 1866] (43)Identifier removed: AH00144: couldn't grab the accept mutex
[Tue Feb 11 14:43:02.650167 2020] [mpm_prefork:emerg] [pid 1867] (43)Identifier removed: AH00144: couldn't grab the accept mutex
[Tue Feb 11 14:43:03.091486 2020] [mpm_prefork:notice] [pid 1328] AH00171: Graceful restart requested, doing restart
[Tue Feb 11 14:43:06.112080 2020] [fcgid:emerg] [pid 1839] (22)Invalid argument: mod_fcgid: can't lock process table in PM, pid 1839
[Tue Feb 11 14:43:06.589487 2020] [lbmethod_heartbeat:notice] [pid 1328] AH02282: No slotmem from mod_heartmonitor
[Tue Feb 11 14:43:06.684209 2020] [ssl:warn] [pid 1328] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Tue Feb 11 14:43:06.684618 2020] [fcgid:emerg] [pid 1328] (22)Invalid argument: mod_fcgid: Can't lock process table for initialization
[Tue Feb 11 14:43:09.414668 2020] [suexec:notice] [pid 2720] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Tue Feb 11 14:43:09.505669 2020] [ssl:warn] [pid 2720] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Tue Feb 11 14:43:09.505689 2020] [:notice] [pid 2720] ModSecurity for Apache/2.9.3 (ModSecurity: Open Source Web Application Firewall) configured.
[Tue Feb 11 14:43:09.505692 2020] [:notice] [pid 2720] ModSecurity: APR compiled version="1.4.8"; loaded version="1.6.5"
[Tue Feb 11 14:43:09.505693 2020] [:warn] [pid 2720] ModSecurity: Loaded APR do not match with compiled!
[Tue Feb 11 14:43:09.505695 2020] [:notice] [pid 2720] ModSecurity: PCRE compiled version="8.32 "; loaded version="8.32 2012-11-30"
[Tue Feb 11 14:43:09.505697 2020] [:notice] [pid 2720] ModSecurity: LUA compiled version="Lua 5.1"
[Tue Feb 11 14:43:09.505698 2020] [:notice] [pid 2720] ModSecurity: LIBXML compiled version="2.9.1"
[Tue Feb 11 14:43:09.505700 2020] [:notice] [pid 2720] ModSecurity: Status engine is currently disabled, enable it by set SecStatusEngine to On.
[Tue Feb 11 14:43:10.008977 2020] [lbmethod_heartbeat:notice] [pid 2720] AH02282: No slotmem from mod_heartmonitor
[Tue Feb 11 14:43:10.123720 2020] [ssl:warn] [pid 2720] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Tue Feb 11 14:43:10.160255 2020] [core:warn] [pid 2720] AH00098: pid file /run/httpd/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
[Tue Feb 11 14:43:10.168939 2020] [mpm_prefork:notice] [pid 2720] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips mod_fcgid/2.3.9 Phusion_Passenger/6.0.2 configured -- resuming normal operations
[Tue Feb 11 14:43:10.168977 2020] [core:notice] [pid 2720] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[...]
 
Please submit Report with all necessary details. I will forward it to developers.
 
@IgorG I've updated the issue:
Suddenly, after the upgrade, mod_fcgid behaviour has changed and now "httpd -k graceful" doesn't work properly if the semaphores owned by apache user are removed beforehand.
 
Back
Top