Issue High latency, unreliable performance & 522 errors on a Plesk server that ran fine for 6+ months

BjornTheBassist · Oct 21, 2024

Hi everyone,

I've been trying to troubleshoot an issue for 5 days straight and starting to lose hope. We have a VPS at Hostinger with 8 CPU cores and 32GB RAM. Performance wise our CPU rarely goes over 20%, and RAM stays well below 30%. We have talked to Hostinger support for over 2 hours, and from their end nothing seems to be wrong with the node. A speedtest on the server also shows it hits it's max upload & download speeds of 300mbps.

The server is ran on AlmaLinux 8.10, with Plesk version Obsidian v18.0.64_build1800241008.13 os_RedHat el8

We have about 30 WordPress sites on this Plesk install, and it has been running fine for 6+ months. 5 days ago we started notice slowdown on all of our websites, and our Uptime Kuma reporting that sites experienced timeouts after 48000ms. Resulting in 522 errors, both for domains proxied through Cloudflare, as unproxied domains.
Most of these sites get little traffic, with the most visited ones probably topping out at 10.000 visitors per month. Most of them are cached through Cloudflare, on top of WordPress caching, NGINX, Redis...

I have gone through many troubleshooting steps, but still haven't found a fix. What we already did so far:
- Verify CPU usage
- Verify RAM usage
- Verified our firewall rules, turned off firewall temporarily to rule out it as the cause of the issue
- Used the built-in diagnose & repair tool. No issues found to be fixed.
- Turned off Cloudflare proxying to rule out Cloudflare: does not change website performance.
- Stress tested our MariaDB server, which can handle 10000 queries coming from 200 concurrent connections before reaching 80% CPU usage. In general we have maximum 20 concurrent connections to the DB server
- Scanned the entire server for malware. One client's website was recently infected and we solved this issue
- Disabled domains that were meant for development purposes, or no longer needed
- Increased cache size for MySQL as well as max open files.
- Restarted both our server and Plesk itself multiple times

On our websites we use:
- WP Fastest Cache in WordPress itself
- Redis caching via docker containers
- NGINX caching
- Most are proxied through Cloudflare with caching rules, some aren't. Issues happen on both types of websites.

Websites that are completely cached still load fairly quickly in general, but also experience 522 errors and slowdowns on an irregular basis. Websites that need to make database calls (webshops for example) go from working reasonably well to taking up to 2 minutes to load a page or not loading at all. And all of this is inconsistent.

There are no apparent errors in Plesk logs that I have found, WordPress or the general domain logs. So at the moment I'm banging my head against a wall and trying to keep our clients updated on the situation.

The one odd thing we keep seeing are 2/3 concurrent database processes with very high TIME_MS happening in the 'psa' database ran by user 'admin'. I believe these are Plesk commands running, but am not sure if this is normal behaviour.

If anyone has any insights in what might be causing this high latency/522 error issue, I would love to hear so! Thanks in advance, and apologies for the longwinded post.

If any logs or extra information on our server are needed to help debugging, I will be happy to provide them.

Sebahat.hadzhi · Oct 22, 2024

Hello, Bjorn. It looks like you already opened a case with our support team. Hopefully, they will be able to determine what the culprit of the issue is.

Issue High latency, unreliable performance & 522 errors on a Plesk server that ran fine for 6+ months

BjornTheBassist

New Pleskian

Sebahat.hadzhi

Community Manager

Similar threads