• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue Abnormal CPU usage and Apache crash

expomeeting

New Pleskian
Server operating system version
Ubuntu 18.04.6 LTS
Plesk version and microupdate number
Plesk Obsidian v18.0.58_build1800240123.22 os_Ubuntu 18.04
Hello, I have this strange problem from several months now and I'm not able to solve it.
We occasionally see strange CPU behavior on this server.
1706869326924.png
This is the current situation, for example. For what I know, that should be a high I/O usage.

Sometimes it happens that the problem also triggers the crash of Apache (not this morning), when it happens all the hosted sites show a 504 gateway timeout. Most of the time stopping Apache causes the CPU to go back to normal and restarting Apache fixes the problem, so I'd say Apache is the suspect. But sometimes even restarting Apache, the CPU consumption quickly increased and Apache crashed again.

I can't figure out where the problem comes from, we have other servers configured the exact same way that never do anything similar. I don't see anything abnormal in the sites log, but it is hard to check them, the server hosts hundreds of domains.
A problem that I don't know if it can be related: a few months ago a WordPress site on this server was hacked, and it infected all the WordPress sites in the same subscription. The hack was to create and execute php files for spam, but it didn't work well because they also created htaccess files not working with my server configuration. They just took down the sites.
We cleaned up all these sites, and it never happened again. Maybe cleaning them up wasn't enough?
Another thing that seems related but I don't know how: this strange behavior always starts around midnight (in my local time). That's about the time for backups, scheduled for midnight. But maybe it's just a coincidence.

Can anyone give me an idea of what to look for?
Thank you
 
You could try to run
MYSQL_PWD=`cat /etc/psa/.psa.shadow` watch "ps aux | sort -nrk 3,3 | head -n 20 && echo "\ " && mysqladmin proc status -u admin"
and watch what the top processes are. If they are websites, you could dive into the website logs to find the root cause (normally these are bad bots hitting a site), but it could equally well be a compression tool working on backups. You'll find out starting with the above shown Linux command.
 
I had a similar situation a couple of years ago. One site was infected (via the clients Windows machine of course) with a PHP script that eventually consumed all CPU. Couldn't see what process was the culprit.
I had to monitor that issue manually for a couple of months, turning off a bunch of the sites at a time to finally pinpoint it down to the one…
 
Did you activated the pagespeedboost for all sites? I had that issue by doing so. So I did a roll back, and everything was fine.
 
Your screenshots show that it start 5 Min after midnight. Do the sceduled backup start then as well? And AWstats need a lot of cpu for their daily clearing and collecting with a large number of domains.
Also a php loop, a db loop for connections to other servers can block and cause if not automaticly closed/ceared a cascading loop.
Look also at apache webserver settings, are there somewhere to high allowences of any of your subscription.
 
This morning it happened again.
Maybe I found something about backups configurations, but I can't figure it out how it can be related with apache freeze.
One of the subscriptions, the bigger one, is used for sites that include a lot of media. These media are often video, and so we have large files. We are not interested in backing up those video, so in that subscription scheduled backup I added the rule to exlude files *.mp4, **/*.mp4,*.m4v, **/*.m4v.
Now what we didn't notice is that there was a backup schedule in "all websites" without this rule. So we have duplicate backups, and very large backups (650GB). I'm now setting better the backups.
So this can be the reason for the CPU behavior, but why apache freezes??? Only apache, nginx works, plesk panel works...
 
In addition: we have another server with the same backup configuration error, and very large backup files too. It never does anything similar... And it is more or less the same server hardware....
 
You could try to run
MYSQL_PWD=`cat /etc/psa/.psa.shadow` watch "ps aux | sort -nrk 3,3 | head -n 20 && echo "\ " && mysqladmin proc status -u admin"
and watch what the top processes are. If they are websites, you could dive into the website logs to find the root cause (normally these are bad bots hitting a site), ...
Did you ever follow this advice? What did you find out?
 
Did you ever follow this advice? What did you find out?
This command, as well as "top", does not show processes with high CPU consumption.
Furthermore, many sites are under the same subscription and under the same system user. So I didn't understand much...
 
Only apache? I read somewhere that the sw-engine for plesk use dedicated ressources of your server, and nginx as well a sparepart of the ressources, defined at nginx.conf. So my guess is the a non static procress grab at backup too much at once and is reason for stuck processes or a "traffice jam" and with upcoming traffic from outside the serve brake down.
Maybe some of the mediafiles are corrupt and are the reason for loops in the backup process.
 
If the way Peter described do not show the issue, you have to search feet after feet.
Try to suspend the subscription you think it is, just before backup time. a couple minutes before. If the bad behavoir do not come up you found the subscription. Then go further on feet by feet, step by step.
As you told plesk is fine and nginx is fine. apache is goning down. Hint for a dynamic process. Crypto script, bad mediafile, corrupted subscription, hacked subscription. too many redirects.........
 
Back
Top