• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Resolved Apache hangs after multiple AH00046 errors

fruf

Basic Pleskian
Server: VPS, 8core, 16GB RAM
OS: Debian 8.1
Plesk version 17.8.11 update 11
PHP version: 5.6.36
Apache MPM mode: Prefork
Currrent config:
StartServers 5
MinSpareServers 10
MaxSpareServers 25
MaxRequestWorkers 300
ServerLimit 300
MaxConnectionsPerChild 10000


Fail2Ban + Modserurity enabled. The servers main traffic comes from two Wordress websites, the combined traffic is about 1000 UV/day.

I have this problem for month now, the sites become unresponsive /no problem accessing Plesk/, the requests time out after a while with 502. Most of the times I don't even receive a notice from Downnotifier because of the long timeout.

Apache error log is full with these entries before it hangs:

AH00046: child process 550 still did not exit, sending a SIGKILL
AH00046: child process 861 still did not exit, sending a SIGKILL
AH00046: child process 869 still did not exit, sending a SIGKILL

When the problem started Apache was set to MPM Event, I switched to Prefork, tried to modify the values in the config, but no luck. I made Apache to restart every day but it doesn't help, it hangs randomly about two-three times a week. What could be the problem here? Thanks in advance.
 
Problem can be very simple like a missing white list entry for localhost and the public IPv4 of the server in the Fail2Ban whitelist.

It can be more complex, too. For example a script in your website that does not deliver a result but is requested frequently. This is often the case when an AJAX based JavaScript application is executed in a browser, constantly sending requests to the server while the server-side script delivers no output. This results in a very high number of processes that are waiting on a web server result, finally too many.

You will need to examine you access_log and error_log files thoroughly and you will find the reason for the behavior there for sure.
 
Thanks for the responses. The problem is not Fail2ban related, the server is on whitelist and a simple apache restart solves it immediately. In todays error log there were 441 AH00045 warnings and after that 111 AH00046 errors and according to the timestamps it all happened within 4 seconds. The only thing close to this is a wp-cron job /40sec before the warnings/, I’ll check that and I’ll report back
 
No luck with with the wp-cron job, tonight apache crashed again and the cron was not running, this was the first error:

[Sun Jun 17 01:16:04.962363 2018] [mpm_prefork:error] [pid 5537] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting

And then two hours later these error came:

[Sun Jun 17 03:00:01.996082 2018] [core:notice] [pid 5537] AH00052: child pid 10148 exit signal Segmentation fault (11)
[Sun Jun 17 03:00:05.080786 2018] [core:warn] [pid 5537] AH00045: child process 18973 still did not exit, sending a SIGTERM
[Sun Jun 17 03:00:05.080885 2018] [core:warn] [pid 5537] AH00045: child process 31005 still did not exit, sending a SIGTERM
[Sun Jun 17 03:00:05.080937 2018] [core:warn] [pid 5537] AH00045: child process 7703 still did not exit, sending a SIGTERM

now 1198 entries in 7 seconds and then this

[Sun Jun 17 03:00:12.685084 2018] [mpm_prefork:notice] [pid 5537] AH00169: caught SIGTERM, shutting down

Nothing related in the logs, I checked the cpu and network usage, it's normal only the mem graph shows that the problem might have started earlier

mem.PNG

What i forgott, these are the PHP settings

php.PNG
 
Last edited:
Have you checked your access_log files for bad bots? Have you blocked bad bots from visiting your site?
 
Yes, nothing special in the log. I have a rather large, 32KB apache-badbot.conf I block basically everything that moves..:)
 
a few days ago I changed the prefork conf from this:

StartServers 5
MinSpareServers 10
MaxSpareServers 25
MaxRequestWorkers 300
ServerLimit 300
MaxConnectionsPerChild 10000

to this:

StartServers 3
MinSpareServers 5
MaxSpareServers 8
MaxRequestWorkers 200
MaxConnectionsPerChild 0

this is from an almost identical VPS with less RAM, since then no crashes but it happens randomly, so don't think this cured the problem.
One more thing, I used mpm event when the problem started a few month ago, in the KB I found this: Apache crashes: scoreboard is full, not at MaxRequestWorkers that's why I swithed to prefork but as you can see that didn't solve it
 
Something is eating up all your apache connection slots. What value did you put for MaxRequestWorkers? If it's already high (say 400 or 500) then simply increasing the "MaxRequestWorkers" setting might not be the ideal solution as it will simply hide the problem until your server is out of memory. So the first thing you need to do is to find out what kind of connections you have. For example, it could be a bad PHP script that is slow or stuck and which will result in many open httpd connections until your Apache scoreboard is full.

I suggest you enable and configure the mod_status module as described here:
How to enable the Apache mod_status module on a Plesk server

After that you should regularly monitor the status page (/server-status) and see what kind of connections you see there. Are they all from the same IP? Or are they all going to the same vhost or PHP script?
 
Right now MaxRequestWorkers is set to 200 but tried 300 and 400 too. Status is enabled I never used it though, will I be able to see the stuck script there?
 
It crashed again AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting. Unfortunately mod_status doesn't help because it's page times out when I try to load it in this state.
 
@fruf,

Please change a number of things to do a proper investigation that allows you to find out the root cause of the problem.

First of all, activate Nginx as reverse proxy in front of the problematic domain/website and set Nginx to deliver static files.

This would reduce the overload on Apache, this is something you really want in this type of analysis:

- you/your Apache is not bothered by requests for static files,
- you are then able to see/determine which requests are actually trying to reach your Apache server,

and it can be even the case that your problem goes away, when using Nginx as a proxy to serve static files.

If your problem actually goes away, please be aware that your issue is very likely (but not solely) related to static files or PHP scripts serving (often huge) static files.

Second, change your Performance and Security settings.

It never is a good idea to have a memory_limit of 256M with a post_max_size of 256M: you should either increase the memory limit or decrease the post_max_size.

I would recommend to decrease the post_max_size and the max_upload_filesize to 64M and 32M respectively.

After all, you are investigation the root cause of your problem and downsizing these values will cause some problems, allowing you to read out the log files.

Third, examine the log files and put them online.

Plesk Experts and forum members can help you better if we have the output from the relevant log files.


Personally, I am pretty sure that activating Nginx and tweaking your settings will provide better performance and potentially solve your issue.

Hope the above helps a bit....... and keep me (or us) posted!

Kind regards.......
 
Thanks trialotto for your help. nginx reverse proxy and static files serving is enabled on all domains.

There are multiple WP sites running on the server and because of their content large file uploads are needed. Now I lowered the upload to 160M - I hope the users won’t complain - and increased the memory limit to 384M.

I'll collect the logs at the next crash and upload here.



Thanks again!
 
Back
Top