• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Server hanging randomly

J

JessB

Guest
I have a server running Plesk 9.5.2 on Debian and every so often it will become completely unresponsive on all protocols except ping. It can happen anywhere from hours to weeks after the server boots and I haven't been able to track down the cause.

I devised a way to capture the output of top just before the server stopped completely, available to view here http://pastebin.com/1Fd0v7nH.

The load averages are through the roof, but CPU usage is low. There are 125 instances of apache2 and 104 instances of relaylock, compared with 14 and 0 currently.

Any help or suggestions would be appreciated.
 
TOP shows that there is almost no memory left and also your swap is full. This leads to out-of-memory errors and the swapping leads to very high load because of much disk IO.
You should have a look into your logs why there are so many processes of apache and relaylock.
You could also reduce your MaxClient setting in your apache configuration so that your server does not start swaping when many clients connect to your apache. To allow more connections with less apache processes you can also reduce the KeepAliveTimeout.
 
Thank you for the suggestions Bevan,

I have reduced MaxClients to 75 (originally 150), and reduced KeepAliveTimeout to 3 (originally 15)

The apache error logs don't have much to say, the only interesting line around the time the server hung was this:

[Sat Nov 06 09:21:47 2010] [error] server reached MaxClients setting, consider raising the MaxClients setting

Although it suggests to increase MaxClients, I think you're on the right track by suggesting to reduce it because the new KeepAliveTimeout should hopefully reduce the number of processes running unnecessarily.

The mail logs show a large amount of Spam coming in (every few seconds), which seems to be the cause of all of the relaylock processes, such as this:

Nov 6 09:57:26 server01 /var/qmail/bin/relaylock[14328]: /var/qmail/bin/relaylock: mail from 109.60.193.7:4254 (ip7.net193.n37.ru)

Under normal conditions the processes aren't alive long enough to appear in top or ps so I'm going to guess that after apache steals all of the RAM, any other processes the server tries to run are delayed.
 
Back
Top