Apache 2.0.46 deadlock problem?

linuxxphybrid · Nov 13, 2005

Q1. The latest version of Plesk comes with Apache 2.0.46. Is this correct?

Q2. If so, it is possible that the version of Apache that comes with Plesk has deadlock problems?

*) mod_cgi: Handle output on stderr during script execution on Unix platforms; preventing deadlock when stderr output fills pipe buffer. Also fixes case where stderr from nph- scripts could be lost. PR 22030, 18348. [Joe Orton, Jeff Trawick]

*) Changed cgi and piped log behavior to accept 65536 characters on Win32 (matching Linux) before deadlocking between outputting client stdin, slurping the output from stdout and then the stderr stream. PR 8179 [William Rowe]

*) Fix the worker MPM deadlock problem [Brian Pane]

*) Patch prefork to put enough of the signal processing back in so that signals are all handled properly now. The previous patch fixed the deadlock race condition, but broke the user directed signal handling. This fixes it to work the way it did before my previous prefork patch (primarily, SIGTERM is now working).

*) Changes required to make prefork clean up idle children properly. There was a window during which a starting worker deadlocks when an idle cleanup arrives before it completes init. Apache then keeps trying to cleanup the same deadlocked worker forever (until higher pids come along, but it still will never reduce below the deadlocked pid). Thus the number of children would not reduce to the correct idle level. [Paul J. Reder]

Q3. Is it critical that I upgrade Apache?

Q4. Is there any way to know what leads to deadlock? Can I configure Apache so that it logs deadlock error and its cause?

hardweb · Nov 14, 2005

Are you talking about the Apacher used by your websites? What OS do you use on the server?

linuxxphybrid · Nov 14, 2005

Originally posted by hardweb
Are you talking about the Apacher used by your websites? What OS do you use on the server?

Yes, I am talking about Apache that displays all websites hosted. My OS is RHE.

linuxxphybrid · Nov 15, 2005

Ok, here's update.

I was away from my desk for an hour or so today, and this error occurred again then. My host company was monitoring this, and posted the following message:

This ticket is being created because our monitoring system presently indicates this host is down. For host purposes, HTTP is being monitored, and it is in fact responding, albeit slowly. You appear to be doing some configuration work on the server, as the escalation procedures you've define have differing content than what was loaded. The primary IP (which is taking more than 10 seconds to load, hence the alert) loads with a SiteBuilder Admin page rather than a default Plesk one.

(Following my escalation procedure, he restarted Apache; all sites were down for about half an hour ***cursing***)
...

The last update in your Apache error_log is [Tue Nov 15 13:43:56 2005]. Therefore I am unable to determine the cause of outage. Usually, this is the result of high load, not an error with Apache.

Is there more information I can provide?

1. I kept saying that Apache locks up, but I think I need to correct this, because it appears that Apache is not completely locking up.

2. If this is happening because of high load, how can I check to see if this is really happening because of high load?

3. If this is happening because of high load, how can I resolve this?

ShadowMan@ · Nov 15, 2005

Originally posted by linuxxphybrid
2. If this is happening because of high load, how can I check to see if this is really happening because of high load?

3. If this is happening because of high load, how can I resolve this?

To see if it's high load, you can SSH into the server and use the following commands to help diagnose:

ps -ax |grep httpd

top

The ps command will show you how many httpd processes are running, if there are lots (or more than you anticipated), then you may be either under an attack, or may be hosting busier websites than you thought. If it's busy websites, then you would have to fine tune your Apache settings (or possibly upgrade your server, you didn't post any hardware specs).

If it's some sort of attack, you can either wait it out, enlist the help of your datacenter, and certainly install things such as mod_security, BFD, mod_dosevasive, and other things.

linuxxphybrid · Nov 15, 2005

Making a long story short, the problem happened again, and since I was monitoring this pretty closely, I asked my host company to look into this when this happened. They say that the problem was caused by high load, though they were unable to determine exactly what was causing the problem. They made the following changes to address the problem:

1. Removed parameters specific to worker.c
2. Raised MaxRequestPerChild to 4K
3. Raised MaxClients to 256

Q1. Does the solution sound reasonable?

Q2. Why did they remove parameters specific to worker.c?

Q3. Why did they raise MaxRequestPerChild to 4K?

Q4. Why did they raise MaxClients to 256?

ShadowMan@ · Nov 15, 2005

This is apache 'tuning', as far as 'reasonable', that depends on if they are sure it's not an attack of some sort. If it's just due to high volume sites, and if the server specs are ok, then raising maxclients is fine.

Setting MaxRequestsPerChild to 4K tells apache to kill the child process after it has served a max of 4K requests. The default is zero, which could cause a child process to live forever even after it's not needed, thus causing problems.

More info on Apache tuning:
perl.apache.org/docs/1.0/guide/performance.pdf

There are other sites out there as well for Apache tuning, this is just one of them.

hardweb · Nov 16, 2005

Lower the MaxRequestPerChild value to 500. This will improve the reliability.

linuxxphybrid · Dec 25, 2005

Originally posted by hardweb
Lower the MaxRequestPerChild value to 500. This will improve the reliability.

What's wrong with 4000? Also can lowing MaxRequestPerChild to 500 cause some problem?

I have a couple of other questions:

Code:

<IfModule prefork.c>
StartServers       8
MinSpareServers    5
MaxSpareServers   20
MaxClients       256
MaxRequestsPerChild  4000
</IfModule>

Q1. How come is MaxRequestsPerChild set inside <IfModule prefork.c> and not outside? (My apology if this is such a trivial question.)

Q2. It is possible that Apache's not being able to connect to Tomcat is causing the problem, but I don't know how all relate each other. Why does Apache's not being able to connect to Tomcat create a problem for Apache? Also does it possibly cause high load?

Q2.2. Isn't there something that I need to do inside <IfModule mod_webapp.c> in order to ease the problem?

linuxxphybrid · Dec 26, 2005

Originally posted by hardweb
Lower the MaxRequestPerChild value to 500. This will improve the reliability.

Ok, I did this, but Apache still hangs. Httpd had 8 connections and was not responding. Once Apache restarted, it had 11 connectons and was responsive. Is there anything we can extract from this information?

linuxxphybrid · Dec 26, 2005

The problem happened again, but I ran the following commands this time and took screenshots.

top (Screenshots is here)
ps -aux (Screenshots is here)
ps -ef (Screenshots is here)
ps -ax | grep httpd (while Apache was hanging) (Screenshots is here)
ps -ax | grep httpd (after Apache restarted) (Screenshots is here)

Is there any information that you can extract from this? If so, what can you figure from these? If not, how should I diagnose the problem next time? What extra information do I need to obtain in order to find the cause?

amram · Jan 16, 2006

Same problem here
Have you been able to solve it?

I did a #tail -f /var/log/httpd/error_log
while it was happening and
dowloaded and watched it closely after the apache crashed. I didn't found anything that can directly describe the crash but there was lots of "file does not exit"s in the error log. also no high iowate or cpu usage just lots of httpd and apache not responding.

linuxxphybrid · Jan 16, 2006

Originally posted by amram
Same problem here
Have you been able to solve it?

I did a #tail -f /var/log/httpd/error_log
while it was happening and
dowloaded and watched it closely after the apache crashed. I didn't found anything that can directly describe the crash but there was lots of "file does not exit"s in the error log. also no high iowate or cpu usage just lots of httpd and apache not responding.

It would be great if you can review the following thread and tell us what you think about it:

http://forum.plesk.com/showthread.php?threadid=29745

The problem itself hasn't been solved yet, but people who participated in the discussion found a work around, and my server currently has 99%+ uptime.

amram · Jan 16, 2006

Thank You
If I found any solution I will share it.

Apache 2.0.46 deadlock problem?

linuxxphybrid

Guest

hardweb

Guest

linuxxphybrid

Guest

linuxxphybrid

Guest

ShadowMan@

Guest

linuxxphybrid

Guest

ShadowMan@

Guest

hardweb

Guest

linuxxphybrid

Guest

linuxxphybrid

Guest

linuxxphybrid

Guest

amram

Guest

linuxxphybrid

Guest

amram

Guest

Similar threads