Yes, the checkbox of graceful restart has been on since day 1.
When site was unreachable I went to the /admin/services/list page of Plesk and could see that the service is running so I thought it is "hanging" but I feel like it was wrongly shown as running whilst actually crashed. Seems like an issue in Plesk: it only shows the service as "stopped" if it has been stopped from this page; it doesn't check if services are actually running upon page loading.
I've installed the Watchdog extension and found out that these crashes happened more often but I just didn't notice that. It's been also down/up during the day but I only could see it since I've installed Watchdog. I'm not sure if restarts happened before too. I only know that before there were some early morning crashes without restart.
In the access/error log I see:
2021-06-27 11:02:01 | Access | | 200 | GET /wp-cron.php?doing_wp_cron HTTP/1.1 |
And soon I received per email: "Apache is down. The problem was discovered on Jun 28, 2021 11:04 AM."
Then it hangs again but without a corresponding wp-cron call: "The problem was discovered on Jun 28, 2021 11:26 AM. "
Then there is a wp-cron call without a correponding crash...
2021-06-27 12:02:01 | Access | 85.214.71.120 | 200 | GET /wp-cron.php?doing_wp_cron HTTP/1.1 |
So there is no strict correlation between these. I had a
define('DISABLE_WP_CRON', true); record in config.php but it didn't work... Now I've changed it to
define('DISABLE_WP_CRON', 'true'); - hope it will ultimately disable wp-cron.
Typing "journalctl -u apache2.service --since today --no-pager" returned (by the way, is there a way to see these things from Plesk itself?):
Jun 28 03:26:32 XXX.stratoserver.net systemd[1]: apache2.service: Control process exited, code=exited status=2
Jun 28 03:26:32 XXX.stratoserver.net systemd[1]: Reload failed for The Apache HTTP Server.
Jun 28 03:26:45 XXX.stratoserver.net systemd[1]: Stopping The Apache HTTP Server...
Jun 28 03:26:45 XXX.stratoserver.net apachectl[32077]: /usr/sbin/apachectl: 98: /usr/sbin/apachectl: Cannot fork
Jun 28 03:26:45 XXX.stratoserver.net systemd[1]: apache2.service: Control process exited, code=exited status=2
Jun 28 03:26:56 XXX.stratoserver.net systemd[1]: apache2.service: Failed with result 'exit-code'.
Jun 28 03:26:56 XXX.stratoserver.net systemd[1]: Stopped The Apache HTTP Server.
Jun 28 03:26:56 XXX.stratoserver.net systemd[1]: Starting The Apache HTTP Server...
Jun 28 03:26:57 XXX.stratoserver.net systemd[1]: Started The Apache HTTP Server.
Jun 28 03:27:00 XXX.stratoserver.net systemd[1]: Reloading The Apache HTTP Server.
Jun 28 03:27:01 XXX.stratoserver.net systemd[1]: Reloaded The Apache HTTP Server.
Jun 28 11:04:48 XXX.stratoserver.net systemd[1]: Stopping The Apache HTTP Server...
Jun 28 11:04:48 XXX.stratoserver.net systemd[1]: apache2.service: Control process exited, code=exited status=2
Jun 28 11:05:59 XXX.stratoserver.net systemd[1]: apache2.service: Failed with result 'exit-code'.
Jun 28 11:05:59 XXX.stratoserver.net systemd[1]: Stopped The Apache HTTP Server.
Jun 28 11:05:59 XXX.stratoserver.net systemd[1]: Starting The Apache HTTP Server...
Jun 28 11:06:00 XXX.stratoserver.net systemd[1]: Started The Apache HTTP Server.
Jun 28 11:26:13 XXX.stratoserver.net systemd[1]: Stopping The Apache HTTP Server...
Jun 28 11:26:13 XXX.stratoserver.net systemd[1]: apache2.service: Control process exited, code=exited status=2
Jun 28 11:26:23 XXX.stratoserver.net systemd[1]: apache2.service: Failed with result 'exit-code'.
Jun 28 11:26:23 XXX.stratoserver.net systemd[1]: Stopped The Apache HTTP Server.
Jun 28 11:26:24 XXX.stratoserver.net systemd[1]: Starting The Apache HTTP Server...
Jun 28 11:26:24 XXX.stratoserver.net systemd[1]: Started The Apache HTTP Server.
Jun 28 12:13:22 XXX.stratoserver.net systemd[1]: Stopping The Apache HTTP Server...
Jun 28 12:13:22 XXX.stratoserver.net apachectl[17974]: /usr/sbin/apachectl: 98: /usr/sbin/apachectl: Cannot fork
Jun 28 12:13:22 XXX.stratoserver.net systemd[1]: apache2.service: Control process exited, code=exited status=2
Jun 28 12:13:33 XXX.stratoserver.net systemd[1]: apache2.service: Failed with result 'exit-code'.
Jun 28 12:13:33 XXX.stratoserver.net systemd[1]: Stopped The Apache HTTP Server.
Jun 28 12:13:33 XXX.stratoserver.net systemd[1]: Starting The Apache HTTP Server...
Jun 28 12:13:33 XXX.stratoserver.net systemd[1]: Started The Apache HTTP Server.
So I tried "tail -n 500 /var/log/apache2/error.log | grep err", this returned:
(dozens of same messages like these three below)
[Mon Jun 28 11:05:18.795193 2021] [mpm_event:error] [pid 32146:tid 140438082923456] (11)Resource temporarily unavailable: AH00481: fork: Unable to fork new process
[Mon Jun 28 11:05:28.810980 2021] [mpm_event:error] [pid 32146:tid 140438082923456] (11)Resource temporarily unavailable: AH00481: fork: Unable to fork new process
[Mon Jun 28 11:05:38.829117 2021] [mpm_event:error] [pid 32146:tid 140438082923456] (11)Resource temporarily unavailable: AH00481: fork: Unable to fork new process
[Mon Jun 28 11:05:58.192783 2021] [core:error] [pid 32146:tid 140438082923456] AH00046: child process 900 still did not exit, sending a SIGKILL
[Mon Jun 28 11:05:58.214747 2021] [core:error] [pid 32146:tid 140438082923456] AH00046: child process 6388 still did not exit, sending a SIGKILL
[Mon Jun 28 11:10:48.391960 2021] [pagespeed:error] [pid 15530:tid 140681175623424] [mod_pagespeed 1.13.35.2-0 @15530] [0628/111048:ERROR:worker.cc(127)] Unable to start worker thread
[Mon Jun 28 11:24:14.819923 2021] [pagespeed:error] [pid 15531:tid 140682769450752] [mod_pagespeed 1.13.35.2-0 @15531] [0628/112414:ERROR:worker.cc(127)] Unable to start worker thread
[Mon Jun 28 11:26:22.525757 2021] [core:error] [pid 15526:tid 140683605941184] AH00046: child process 15530 still did not exit, sending a SIGKILL
[Mon Jun 28 12:08:57.393869 2021] [pagespeed:error] [pid 16067:tid 140594638726912] [mod_pagespeed 1.13.35.2-0 @16067] [0628/120857:ERROR:worker.cc(127)] Unable to start worker thread
[Mon Jun 28 12:08:57.400368 2021] [pagespeed:error] [pid 16067:tid 140594638726912] [mod_pagespeed 1.13.35.2-0 @16067] [0628/120857:ERROR:worker.cc(127)] Unable to start worker thread
[Mon Jun 28 12:13:31.988520 2021] [core:error] [pid 16062:tid 140595272010688] AH00046: child process 16067 still did not exit, sending a SIGKILL
Why do I see nothing like this in the Apache Logs in Plesk?
So I've disabled Apache module
pagespeed and restarted Apache2. Will see what happens.