• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Forwarded to devs After Onyx -> Obsidian upgrade we see "graceful-req",false followed by "stop-start",true in apache_restart_stat.log

burnley

Regular Pleskian
User name: burnley

TITLE

After Onyx -> Obsidian upgrade we see "graceful-req",false followed by "stop-start",true in apache_restart_stat.log

PRODUCT, VERSION, OPERATING SYSTEM, ARCHITECTURE

‪Plesk Obsidian Version 18.0.23 Update #3, last updated on Feb 10, 2020 04:37 PM CentOS Linux 7.7.1908 (Core)‬

PROBLEM DESCRIPTION

Initially reported here:
Issue - After Onyx -> Obsidian upgrade we've started getting random http 502 errors on multiple servers
Have a look at /var/log/plesk/apache_restart_stat.log pre & post upgrade:
Onyx:
2020-02-03T04:57:55.855Z,6883,"graceful-req",true,6.11954
2020-02-03T07:32:24.209Z,28017,"graceful-req",true,10.4599
2020-02-03T07:59:57.270Z,18756,"graceful-req",true,5.23641
2020-02-03T08:57:54.698Z,3222,"graceful-req",true,9.07493
2020-02-03T15:30:36.677Z,20793,"graceful-req",true,10.3177
2020-02-03T19:40:53.950Z,25415,"graceful-req",true,9.67908
2020-02-03T21:58:12.655Z,8616,"graceful-req",true,8.2042
2020-02-04T01:49:44.257Z,17991,"graceful-req",true,8.52452
2020-02-04T01:54:56.958Z,18741,"graceful-req",true,8.16627
2020-02-04T09:27:04.962Z,2627,"graceful-req",true,9.76969
2020-02-04T09:32:11.763Z,3596,"graceful-req",true,4.03538
2020-02-04T10:15:33.975Z,11881,"graceful-req",true,8.17584
2020-02-04T15:31:31.511Z,30918,"graceful-req",true,11.0243
2020-02-04T23:57:58.483Z,7985,"graceful-req",true,9.13726
2020-02-05T01:48:28.392Z,22396,"graceful-req",true,5.85421
2020-02-05T02:57:59.850Z,28245,"graceful-req",true,7.1734
2020-02-05T03:57:55.411Z,24478,"graceful-req",true,9.15592
2020-02-05T04:41:03.104Z,1993,"graceful-req",true,10.7638

Obsidian:
2020-02-05T10:37:17.232Z,25109,"graceful-req",false,5.46511
2020-02-05T10:38:52.120Z,25109,"stop-start",true,94.8702
2020-02-05T22:25:12.716Z,27675,"graceful-req",false,8.6504
2020-02-05T22:26:47.754Z,27675,"stop-start",true,95.0165
2020-02-06T00:25:06.569Z,27577,"graceful-req",false,11.0113
2020-02-06T00:26:41.570Z,27577,"stop-start",true,94.9788
2020-02-06T01:48:34.879Z,20650,"graceful-req",false,8.62033
2020-02-06T01:50:09.771Z,20650,"stop-start",true,94.8759
2020-02-06T01:55:21.121Z,22426,"graceful-req",false,7.8849
2020-02-06T01:56:56.067Z,22426,"stop-start",true,94.9395
2020-02-06T22:38:35.925Z,18882,"graceful-req",false,10.3857
2020-02-06T22:40:10.957Z,18882,"stop-start",true,95.0126
2020-02-07T01:19:28.342Z,13437,"graceful-req",false,10.9966
2020-02-07T01:21:03.391Z,13437,"stop-start",true,95.0419
2020-02-07T01:26:12.907Z,20158,"graceful-req",false,6.68214
2020-02-07T01:27:47.789Z,20158,"stop-start",true,94.8762
2020-02-07T02:00:53.252Z,26794,"graceful-req",false,6.76953
2020-02-07T02:02:28.309Z,26794,"stop-start",true,95.0501
2020-02-07T02:24:52.122Z,19188,"graceful-req",false,11.0279
2020-02-07T02:26:27.195Z,19188,"stop-start",true,95.0651
2020-02-07T03:25:23.553Z,16484,"graceful-req",false,9.1991
2020-02-07T03:26:58.726Z,16484,"stop-start",true,95.1648


One of the undesired side effects is that now on all the upgraded servers *except for one* we're seeing batches of http 502 errors, all resulted after a graceful-req / stop-start action.

More log data:

/var/log/plesk/panel.log
[...]
[2020-02-11 14:43:01.969] DEBUG [panel] Service_Driver_Web_Server_Apache::logRestartForStatistics
[2020-02-11 14:43:01.988] DEBUG [util_exec] [a902567b03d442d98dd4fab154b7e854-0] Starting: apache_control_adapter --restart --restart-interval 300 --http-port 7080 --https-port 7081 --graceful
[2020-02-11 14:43:10.348] DEBUG [util_exec] [a902567b03d442d98dd4fab154b7e854-0] Finished in 8.35917s, Error code: 0
[...]

/var/log/plesk/apache_restart_stat.log
[...]
2020-02-11T03:43:08.430Z,2442,"graceful-req",false,5.26458
2020-02-11T03:43:10.347Z,2442,"stop-start",true,1.91696
[...]

/var/log/httpd/error_log
[...]
[Tue Feb 11 14:43:02.645363 2020] [mpm_prefork:emerg] [pid 1864] (43)Identifier removed: AH00144: couldn't grab the accept mutex
[Tue Feb 11 14:43:02.645364 2020] [mpm_prefork:emerg] [pid 1863] (43)Identifier removed: AH00144: couldn't grab the accept mutex
[Tue Feb 11 14:43:02.649094 2020] [mpm_prefork:emerg] [pid 1866] (43)Identifier removed: AH00144: couldn't grab the accept mutex
[Tue Feb 11 14:43:02.650167 2020] [mpm_prefork:emerg] [pid 1867] (43)Identifier removed: AH00144: couldn't grab the accept mutex
[Tue Feb 11 14:43:03.091486 2020] [mpm_prefork:notice] [pid 1328] AH00171: Graceful restart requested, doing restart
[Tue Feb 11 14:43:06.112080 2020] [fcgid:emerg] [pid 1839] (22)Invalid argument: mod_fcgid: can't lock process table in PM, pid 1839
[Tue Feb 11 14:43:06.589487 2020] [lbmethod_heartbeat:notice] [pid 1328] AH02282: No slotmem from mod_heartmonitor
[Tue Feb 11 14:43:06.684209 2020] [ssl:warn] [pid 1328] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Tue Feb 11 14:43:06.684618 2020] [fcgid:emerg] [pid 1328] (22)Invalid argument: mod_fcgid: Can't lock process table for initialization
[Tue Feb 11 14:43:09.414668 2020] [suexec:notice] [pid 2720] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Tue Feb 11 14:43:09.505669 2020] [ssl:warn] [pid 2720] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Tue Feb 11 14:43:09.505689 2020] [:notice] [pid 2720] ModSecurity for Apache/2.9.3 (ModSecurity: Open Source Web Application Firewall) configured.
[Tue Feb 11 14:43:09.505692 2020] [:notice] [pid 2720] ModSecurity: APR compiled version="1.4.8"; loaded version="1.6.5"
[Tue Feb 11 14:43:09.505693 2020] [:warn] [pid 2720] ModSecurity: Loaded APR do not match with compiled!
[Tue Feb 11 14:43:09.505695 2020] [:notice] [pid 2720] ModSecurity: PCRE compiled version="8.32 "; loaded version="8.32 2012-11-30"
[Tue Feb 11 14:43:09.505697 2020] [:notice] [pid 2720] ModSecurity: LUA compiled version="Lua 5.1"
[Tue Feb 11 14:43:09.505698 2020] [:notice] [pid 2720] ModSecurity: LIBXML compiled version="2.9.1"
[Tue Feb 11 14:43:09.505700 2020] [:notice] [pid 2720] ModSecurity: Status engine is currently disabled, enable it by set SecStatusEngine to On.
[Tue Feb 11 14:43:10.008977 2020] [lbmethod_heartbeat:notice] [pid 2720] AH02282: No slotmem from mod_heartmonitor
[Tue Feb 11 14:43:10.123720 2020] [ssl:warn] [pid 2720] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Tue Feb 11 14:43:10.160255 2020] [core:warn] [pid 2720] AH00098: pid file /run/httpd/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
[Tue Feb 11 14:43:10.168939 2020] [mpm_prefork:notice] [pid 2720] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips mod_fcgid/2.3.9 Phusion_Passenger/6.0.2 configured -- resuming normal operations
[Tue Feb 11 14:43:10.168977 2020] [core:notice] [pid 2720] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[...]

STEPS TO REPRODUCE

Can't tell. In our case it's an Onyx -> Obsidian upgrade

ACTUAL RESULT

Websites returning 502

EXPECTED RESULT

Successful graceful actions, no http 502

ANY ADDITIONAL INFORMATION



YOUR EXPECTATIONS FROM PLESK SERVICE TEAM

Confirm bug
 
I think there might be an issue with /usr/local/psa/admin/sbin/apache_control_adapter tool. On an affected test server I see this:
/usr/local/psa/admin/sbin/apache_control_adapter --restart --http-port 7080 --https-port 7081 --graceful -d
apache ready test: request with 40 sec timeout finished in 0
apache ready test: request error 56: Failure when receiving data from the peer
apache ready test: error details: Recv failure: Connection reset by peer
apache ready test: pause of 4995621 usec before retry
apache ready test: failed, no daemon is running
graceful restart failed, perform full restart
Does it every time.

And in /var/log/messages:
Feb 11 15:55:31 plesk12 kill: kill: cannot find process ""
 
Think I might have found the reason. It's not in Plesk's cli tool, it's in the systemd unit file using for httpd. Because of the Apache semaphores leak issue we were using a custom httpd.service file that would remove the Apache semaphores on service reload as a workaround. However, after upgrading Plesk (including psa-mod_fcgid) package, we can't run ipcrm anymore on the Apache semaphores, because this is tripping httpd:

[root@plesk12 ~]# ipcs -s | awk '/apache/ {print $2}'
7995392
8028161
2850818
8060931
8093700
8126469
8159238
3080199
3112968
8192009

[root@plesk12 ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/httpd.service.d
└─limit_nofile.conf
Active: active (running) (Result: exit-code) since Tue 2020-02-11 16:13:22 AEDT; 39min ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 8421 ExecReload=/bin/sh -c /usr/sbin/httpd -t && /usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS)
Main PID: 3697 (httpd)
Status: "Total requests: 0; Current requests/sec: 0; Current traffic: 0 B/sec"
CGroup: /system.slice/httpd.service
├─3697 /usr/sbin/httpd -DFOREGROUND
├─8437 /usr/local/psa/admin/sbin/pipelog 7081
├─8439 /usr/sbin/httpd -DFOREGROUND
├─8441 Passenger watchdog
├─8444 Passenger core
├─8461 /usr/sbin/httpd -DFOREGROUND
├─8462 /usr/sbin/httpd -DFOREGROUND
├─8463 /usr/sbin/httpd -DFOREGROUND
├─8464 /usr/sbin/httpd -DFOREGROUND
└─8465 /usr/sbin/httpd -DFOREGROUND

Feb 11 16:46:53 plesk12.ddns.local systemd[1]: Reloaded The Apache HTTP Server.
Feb 11 16:46:58 plesk12.ddns.local systemd[1]: Reloading The Apache HTTP Server.
Feb 11 16:46:58 plesk12.ddns.local sh[8302]: Syntax OK
Feb 11 16:46:59 plesk12.ddns.local systemd[1]: Reloaded The Apache HTTP Server.
Feb 11 16:47:04 plesk12.ddns.local systemd[1]: Reloading The Apache HTTP Server.
Feb 11 16:47:04 plesk12.ddns.local sh[8365]: Syntax OK
Feb 11 16:47:04 plesk12.ddns.local systemd[1]: Reloaded The Apache HTTP Server.
Feb 11 16:47:10 plesk12.ddns.local systemd[1]: Reloading The Apache HTTP Server.
Feb 11 16:47:10 plesk12.ddns.local sh[8421]: Syntax OK
Feb 11 16:47:10 plesk12.ddns.local systemd[1]: Reloaded The Apache HTTP Server.

[root@plesk12 ~]# for i in `ipcs -s | awk '/apache/ {print $2}'`; do (ipcrm -s $i); done
[root@plesk12 ~]# ipcs -s | awk '/apache/ {print $2}'
[root@plesk12 ~]# httpd -t
Syntax OK

[root@plesk12 ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/httpd.service.d
└─limit_nofile.conf
Active: deactivating (stop-sigterm) (Result: exit-code) since Tue 2020-02-11 16:53:46 AEDT; 26s ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 8918 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=1/FAILURE) <<<<<===== OUCH, not good!
Process: 8421 ExecReload=/bin/sh -c /usr/sbin/httpd -t && /usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS)
Process: 3697 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=0/SUCCESS)
Main PID: 3697 (code=exited, status=0/SUCCESS)
Status: "Total requests: 0; Current requests/sec: 0; Current traffic: 0 B/sec"
CGroup: /system.slice/httpd.service
└─8461 /usr/sbin/httpd -DFOREGROUND

So, I've updated /etc/systemd/system/httpd.service and removed the ipcrm command for ExecReload. After the change "systemctl reload httpd" is working fine. Now we need to monitor the Apache semaphores numbers.

Question: is this new behaviour caused by the updated psa-mod_fcgid package?
 
Confirm bug
Not a bug in Plesk, as you've already figured out. Judging by the provided logs it looks like (due to IPCs cleanup) Apache/mod_fcgid crashed during reload. Plesk apache_control_adapter detected this and has fallen back to restarting the service.
There's also suspicious ModSecurity: APR compiled version="1.4.8"; loaded version="1.6.5" and ModSecurity: Loaded APR do not match with compiled! messages in the log. You are probably using a third-party mod_security build, which might not be entirely compatible with the OS Apache version.
Now we need to monitor the Apache semaphores numbers.
Please do. In fact, in Plesk 18.0.19 we've updated mod_fcgid patches with changes from 48769 – [PATCH] Processes in the busy list should not be killed during graceful restarts, which resolved semaphore leakage in our test environment and might resolve your original issue as well. Please check. Unfortunately, this change was not mentioned in Plesk release notes.
Question: is this new behaviour caused by the updated psa-mod_fcgid package?
Possibly, but it's likely a side effect of other changes in mod_fcgid. Removing semaphores from a running process is not a good idea in any case, so you were probably just lucky that it didn't cause issues in the past.
 
Great, thanks for confirming the mod_fcgid changes, it now all makes sense.

On the mod_security version mismatch, I'm using the rpm shipped by Plesk:
[root@plesk12 ~]# rpm -qa | grep mod_security
mod_security-2.9.3-2.centos.7+p18.0.23.3+t200130.1239.x86_64
[root@plesk12 ~]# updatedb
[root@plesk12 ~]# locate mod_security
/etc/logrotate.d/mod_security
/usr/lib64/httpd/modules/mod_security2.so
/usr/share/doc/mod_security-2.9.3
/usr/share/doc/mod_security-2.9.3/CHANGES
/usr/share/doc/mod_security-2.9.3/LICENSE
/usr/share/doc/mod_security-2.9.3/NOTICE
/usr/share/doc/mod_security-2.9.3/README.md
/usr/share/doc/mod_security-2.9.3/authors.txt
/usr/share/doc/mod_security-2.9.3/doc
/usr/share/doc/mod_security-2.9.3/mlogc-default.conf
/usr/share/doc/mod_security-2.9.3/doc/README.txt
[root@plesk12 ~]# rpm -qf /usr/lib64/httpd/modules/mod_security2.so
mod_security-2.9.3-2.centos.7+p18.0.23.3+t200130.1239.x86_64
 
Looks like you run an Apache with a higher version of APR. On my Plesk server I see:

[Thu Feb 13 05:00:54.081063 2020] [:notice] [pid 23800:tid 139699230054528] ModSecurity: APR compiled version="1.4.8"; loaded version="1.4.8"
[Thu Feb 13 05:00:54.081073 2020] [:notice] [pid 23800:tid 139699230054528] ModSecurity: PCRE compiled version="8.32 "; loaded version="8.32 2012-11-30"

What is the output of

# rpm -qa | grep httpd

?
 
That's right @IgorG we're running higher apr version on this test server, so not an issue:
rpm -q apr
apr-1.6.5-1.x86_64
In live we're running apr-1.4.8-5.el7.x86_64
 
Back
Top