Issue Abnormal CPU usage and Apache crash

expomeeting · Feb 2, 2024

Hello, I have this strange problem from several months now and I'm not able to solve it.
We occasionally see strange CPU behavior on this server.

This is the current situation, for example. For what I know, that should be a high I/O usage.

Sometimes it happens that the problem also triggers the crash of Apache (not this morning), when it happens all the hosted sites show a 504 gateway timeout. Most of the time stopping Apache causes the CPU to go back to normal and restarting Apache fixes the problem, so I'd say Apache is the suspect. But sometimes even restarting Apache, the CPU consumption quickly increased and Apache crashed again.

I can't figure out where the problem comes from, we have other servers configured the exact same way that never do anything similar. I don't see anything abnormal in the sites log, but it is hard to check them, the server hosts hundreds of domains.
A problem that I don't know if it can be related: a few months ago a WordPress site on this server was hacked, and it infected all the WordPress sites in the same subscription. The hack was to create and execute php files for spam, but it didn't work well because they also created htaccess files not working with my server configuration. They just took down the sites.
We cleaned up all these sites, and it never happened again. Maybe cleaning them up wasn't enough?
Another thing that seems related but I don't know how: this strange behavior always starts around midnight (in my local time). That's about the time for backups, scheduled for midnight. But maybe it's just a coincidence.

Can anyone give me an idea of what to look for?
Thank you

Peter Debik · Feb 2, 2024

You could try to run

MYSQL_PWD=`cat /etc/psa/.psa.shadow` watch "ps aux | sort -nrk 3,3 | head -n 20 && echo "\ " && mysqladmin proc status -u admin"

and watch what the top processes are. If they are websites, you could dive into the website logs to find the root cause (normally these are bad bots hitting a site), but it could equally well be a compression tool working on backups. You'll find out starting with the above shown Linux command.

carlsson · Feb 2, 2024

I had a similar situation a couple of years ago. One site was infected (via the clients Windows machine of course) with a PHP script that eventually consumed all CPU. Couldn't see what process was the culprit.
I had to monitor that issue manually for a couple of months, turning off a bunch of the sites at a time to finally pinpoint it down to the one…

expomeeting · Feb 8, 2024

Now obviously it's not doing it for days... Until suddenly the server will stop working, dammit

MartinT · Feb 8, 2024

Did you activated the pagespeedboost for all sites? I had that issue by doing so. So I did a roll back, and everything was fine.

expomeeting · Feb 8, 2024

No, I didn't

MartinT · Feb 8, 2024

Your screenshots show that it start 5 Min after midnight. Do the sceduled backup start then as well? And AWstats need a lot of cpu for their daily clearing and collecting with a large number of domains.
Also a php loop, a db loop for connections to other servers can block and cause if not automaticly closed/ceared a cascading loop.
Look also at apache webserver settings, are there somewhere to high allowences of any of your subscription.

expomeeting · Feb 9, 2024

This morning it happened again.
Maybe I found something about backups configurations, but I can't figure it out how it can be related with apache freeze.
One of the subscriptions, the bigger one, is used for sites that include a lot of media. These media are often video, and so we have large files. We are not interested in backing up those video, so in that subscription scheduled backup I added the rule to exlude files *.mp4, **/*.mp4,*.m4v, **/*.m4v.
Now what we didn't notice is that there was a backup schedule in "all websites" without this rule. So we have duplicate backups, and very large backups (650GB). I'm now setting better the backups.
So this can be the reason for the CPU behavior, but why apache freezes??? Only apache, nginx works, plesk panel works...

expomeeting · Feb 9, 2024

In addition: we have another server with the same backup configuration error, and very large backup files too. It never does anything similar... And it is more or less the same server hardware....

Peter Debik · Feb 9, 2024

Peter Debik said:
You could try to run
MYSQL_PWD=`cat /etc/psa/.psa.shadow` watch "ps aux | sort -nrk 3,3 | head -n 20 && echo "\ " && mysqladmin proc status -u admin"
and watch what the top processes are. If they are websites, you could dive into the website logs to find the root cause (normally these are bad bots hitting a site), ...

Did you ever follow this advice? What did you find out?

expomeeting · Feb 9, 2024

Peter Debik said:
Did you ever follow this advice? What did you find out?

This command, as well as "top", does not show processes with high CPU consumption.
Furthermore, many sites are under the same subscription and under the same system user. So I didn't understand much...

MartinT · Feb 9, 2024

Only apache? I read somewhere that the sw-engine for plesk use dedicated ressources of your server, and nginx as well a sparepart of the ressources, defined at nginx.conf. So my guess is the a non static procress grab at backup too much at once and is reason for stuck processes or a "traffice jam" and with upcoming traffic from outside the serve brake down.
Maybe some of the mediafiles are corrupt and are the reason for loops in the backup process.

Peter Debik · Feb 9, 2024

expomeeting said:
This command, as well as "top", does not show processes with high CPU consumption.

How could that be possible if at the same time high cpu usage is logged by monitoring?

expomeeting said:
Furthermore, many sites are under the same subscription and under the same system user.

And what load does that user create?

MartinT · Feb 10, 2024

If the way Peter described do not show the issue, you have to search feet after feet.
Try to suspend the subscription you think it is, just before backup time. a couple minutes before. If the bad behavoir do not come up you found the subscription. Then go further on feet by feet, step by step.
As you told plesk is fine and nginx is fine. apache is goning down. Hint for a dynamic process. Crypto script, bad mediafile, corrupted subscription, hacked subscription. too many redirects.........

Issue Abnormal CPU usage and Apache crash

expomeeting

New Pleskian

Peter Debik

Community Manager until 3/2024

carlsson

Basic Pleskian

expomeeting

New Pleskian

MartinT

Basic Pleskian

expomeeting

New Pleskian

MartinT

Basic Pleskian

expomeeting

New Pleskian

expomeeting

New Pleskian

Peter Debik

Community Manager until 3/2024

expomeeting

New Pleskian

MartinT

Basic Pleskian

Peter Debik

Community Manager until 3/2024

MartinT

Basic Pleskian

Similar threads