• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Apache 2.0.46 deadlock problem?

L

linuxxphybrid

Guest
Q1. The latest version of Plesk comes with Apache 2.0.46. Is this correct?

Q2. If so, it is possible that the version of Apache that comes with Plesk has deadlock problems?

*) mod_cgi: Handle output on stderr during script execution on Unix platforms; preventing deadlock when stderr output fills pipe buffer. Also fixes case where stderr from nph- scripts could be lost. PR 22030, 18348. [Joe Orton, Jeff Trawick]

*) Changed cgi and piped log behavior to accept 65536 characters on Win32 (matching Linux) before deadlocking between outputting client stdin, slurping the output from stdout and then the stderr stream. PR 8179 [William Rowe]

*) Fix the worker MPM deadlock problem [Brian Pane]

*) Patch prefork to put enough of the signal processing back in so that signals are all handled properly now. The previous patch fixed the deadlock race condition, but broke the user directed signal handling. This fixes it to work the way it did before my previous prefork patch (primarily, SIGTERM is now working).

*) Changes required to make prefork clean up idle children properly. There was a window during which a starting worker deadlocks when an idle cleanup arrives before it completes init. Apache then keeps trying to cleanup the same deadlocked worker forever (until higher pids come along, but it still will never reduce below the deadlocked pid). Thus the number of children would not reduce to the correct idle level. [Paul J. Reder]

Q3. Is it critical that I upgrade Apache?

Q4. Is there any way to know what leads to deadlock? Can I configure Apache so that it logs deadlock error and its cause?
 
Are you talking about the Apacher used by your websites? What OS do you use on the server?
 
Originally posted by hardweb
Are you talking about the Apacher used by your websites? What OS do you use on the server?
Yes, I am talking about Apache that displays all websites hosted. My OS is RHE.
 
Ok, here's update.

I was away from my desk for an hour or so today, and this error occurred again then. My host company was monitoring this, and posted the following message:

This ticket is being created because our monitoring system presently indicates this host is down. For host purposes, HTTP is being monitored, and it is in fact responding, albeit slowly. You appear to be doing some configuration work on the server, as the escalation procedures you've define have differing content than what was loaded. The primary IP (which is taking more than 10 seconds to load, hence the alert) loads with a SiteBuilder Admin page rather than a default Plesk one.

(Following my escalation procedure, he restarted Apache; all sites were down for about half an hour ***cursing***)
...

The last update in your Apache error_log is [Tue Nov 15 13:43:56 2005]. Therefore I am unable to determine the cause of outage. Usually, this is the result of high load, not an error with Apache.

Is there more information I can provide?

1. I kept saying that Apache locks up, but I think I need to correct this, because it appears that Apache is not completely locking up.

2. If this is happening because of high load, how can I check to see if this is really happening because of high load?

3. If this is happening because of high load, how can I resolve this?
 
Originally posted by linuxxphybrid
2. If this is happening because of high load, how can I check to see if this is really happening because of high load?

3. If this is happening because of high load, how can I resolve this?
To see if it's high load, you can SSH into the server and use the following commands to help diagnose:

ps -ax |grep httpd

top

The ps command will show you how many httpd processes are running, if there are lots (or more than you anticipated), then you may be either under an attack, or may be hosting busier websites than you thought. If it's busy websites, then you would have to fine tune your Apache settings (or possibly upgrade your server, you didn't post any hardware specs).

If it's some sort of attack, you can either wait it out, enlist the help of your datacenter, and certainly install things such as mod_security, BFD, mod_dosevasive, and other things.
 
Making a long story short, the problem happened again, and since I was monitoring this pretty closely, I asked my host company to look into this when this happened. They say that the problem was caused by high load, though they were unable to determine exactly what was causing the problem. They made the following changes to address the problem:

1. Removed parameters specific to worker.c
2. Raised MaxRequestPerChild to 4K
3. Raised MaxClients to 256

Q1. Does the solution sound reasonable?

Q2. Why did they remove parameters specific to worker.c?

Q3. Why did they raise MaxRequestPerChild to 4K?

Q4. Why did they raise MaxClients to 256?
 
This is apache 'tuning', as far as 'reasonable', that depends on if they are sure it's not an attack of some sort. If it's just due to high volume sites, and if the server specs are ok, then raising maxclients is fine.

Setting MaxRequestsPerChild to 4K tells apache to kill the child process after it has served a max of 4K requests. The default is zero, which could cause a child process to live forever even after it's not needed, thus causing problems.

More info on Apache tuning:
perl.apache.org/docs/1.0/guide/performance.pdf

There are other sites out there as well for Apache tuning, this is just one of them.
 
Lower the MaxRequestPerChild value to 500. This will improve the reliability.
 
Originally posted by hardweb
Lower the MaxRequestPerChild value to 500. This will improve the reliability.
What's wrong with 4000? Also can lowing MaxRequestPerChild to 500 cause some problem?

I have a couple of other questions:

Code:
<IfModule prefork.c>
StartServers       8
MinSpareServers    5
MaxSpareServers   20
MaxClients       256
MaxRequestsPerChild  4000
</IfModule>
Q1. How come is MaxRequestsPerChild set inside <IfModule prefork.c> and not outside? (My apology if this is such a trivial question.)

Q2. It is possible that Apache's not being able to connect to Tomcat is causing the problem, but I don't know how all relate each other. Why does Apache's not being able to connect to Tomcat create a problem for Apache? Also does it possibly cause high load?

Q2.2. Isn't there something that I need to do inside <IfModule mod_webapp.c> in order to ease the problem?
 
Originally posted by hardweb
Lower the MaxRequestPerChild value to 500. This will improve the reliability.
Ok, I did this, but Apache still hangs. Httpd had 8 connections and was not responding. Once Apache restarted, it had 11 connectons and was responsive. Is there anything we can extract from this information?
 
The problem happened again, but I ran the following commands this time and took screenshots.

top (Screenshots is here)
ps -aux (Screenshots is here)
ps -ef (Screenshots is here)
ps -ax | grep httpd (while Apache was hanging) (Screenshots is here)
ps -ax | grep httpd (after Apache restarted) (Screenshots is here)

Is there any information that you can extract from this? If so, what can you figure from these? If not, how should I diagnose the problem next time? What extra information do I need to obtain in order to find the cause?
 
Same problem here
Have you been able to solve it?

I did a #tail -f /var/log/httpd/error_log
while it was happening and
dowloaded and watched it closely after the apache crashed. I didn't found anything that can directly describe the crash but there was lots of "file does not exit"s in the error log. also no high iowate or cpu usage just lots of httpd and apache not responding.
 
Originally posted by amram
Same problem here
Have you been able to solve it?

I did a #tail -f /var/log/httpd/error_log
while it was happening and
dowloaded and watched it closely after the apache crashed. I didn't found anything that can directly describe the crash but there was lots of "file does not exit"s in the error log. also no high iowate or cpu usage just lots of httpd and apache not responding.
It would be great if you can review the following thread and tell us what you think about it:

http://forum.plesk.com/showthread.php?threadid=29745

The problem itself hasn't been solved yet, but people who participated in the discussion found a work around, and my server currently has 99%+ uptime.
 
Back
Top