Issue High latency, unreliable performance & 522 errors on a Plesk server that ran fine for 6+ months

BjornTheBassist · Oct 21, 2024

Hi everyone,

I've been trying to troubleshoot an issue for 5 days straight and starting to lose hope. We have a VPS at Hostinger with 8 CPU cores and 32GB RAM. Performance wise our CPU rarely goes over 20%, and RAM stays well below 30%. We have talked to Hostinger support for over 2 hours, and from their end nothing seems to be wrong with the node. A speedtest on the server also shows it hits it's max upload & download speeds of 300mbps.

The server is ran on AlmaLinux 8.10, with Plesk version Obsidian v18.0.64_build1800241008.13 os_RedHat el8

We have about 30 WordPress sites on this Plesk install, and it has been running fine for 6+ months. 5 days ago we started notice slowdown on all of our websites, and our Uptime Kuma reporting that sites experienced timeouts after 48000ms. Resulting in 522 errors, both for domains proxied through Cloudflare, as unproxied domains.
Most of these sites get little traffic, with the most visited ones probably topping out at 10.000 visitors per month. Most of them are cached through Cloudflare, on top of WordPress caching, NGINX, Redis...

I have gone through many troubleshooting steps, but still haven't found a fix. What we already did so far:
- Verify CPU usage
- Verify RAM usage
- Verified our firewall rules, turned off firewall temporarily to rule out it as the cause of the issue
- Used the built-in diagnose & repair tool. No issues found to be fixed.
- Turned off Cloudflare proxying to rule out Cloudflare: does not change website performance.
- Stress tested our MariaDB server, which can handle 10000 queries coming from 200 concurrent connections before reaching 80% CPU usage. In general we have maximum 20 concurrent connections to the DB server
- Scanned the entire server for malware. One client's website was recently infected and we solved this issue
- Disabled domains that were meant for development purposes, or no longer needed
- Increased cache size for MySQL as well as max open files.
- Restarted both our server and Plesk itself multiple times

On our websites we use:
- WP Fastest Cache in WordPress itself
- Redis caching via docker containers
- NGINX caching
- Most are proxied through Cloudflare with caching rules, some aren't. Issues happen on both types of websites.

Websites that are completely cached still load fairly quickly in general, but also experience 522 errors and slowdowns on an irregular basis. Websites that need to make database calls (webshops for example) go from working reasonably well to taking up to 2 minutes to load a page or not loading at all. And all of this is inconsistent.

There are no apparent errors in Plesk logs that I have found, WordPress or the general domain logs. So at the moment I'm banging my head against a wall and trying to keep our clients updated on the situation.

The one odd thing we keep seeing are 2/3 concurrent database processes with very high TIME_MS happening in the 'psa' database ran by user 'admin'. I believe these are Plesk commands running, but am not sure if this is normal behaviour.

If anyone has any insights in what might be causing this high latency/522 error issue, I would love to hear so! Thanks in advance, and apologies for the longwinded post.

If any logs or extra information on our server are needed to help debugging, I will be happy to provide them.

Sebahat.hadzhi · Oct 22, 2024

Hello, Bjorn. It looks like you already opened a case with our support team. Hopefully, they will be able to determine what the culprit of the issue is.

realaaa@ · Nov 24, 2024

BjornTheBassist said:
Hi everyone,

I've been trying to troubleshoot an issue for 5 days straight and starting to lose hope. We have a VPS at Hostinger with 8 CPU cores and 32GB RAM. Performance wise our CPU rarely goes over 20%, and RAM stays well below 30%. We have talked to Hostinger support for over 2 hours, and from their end nothing seems to be wrong with the node. A speedtest on the server also shows it hits it's max upload & download speeds of 300mbps.

The server is ran on AlmaLinux 8.10, with Plesk version Obsidian v18.0.64_build1800241008.13 os_RedHat el8

We have about 30 WordPress sites on this Plesk install, and it has been running fine for 6+ months. 5 days ago we started notice slowdown on all of our websites, and our Uptime Kuma reporting that sites experienced timeouts after 48000ms. Resulting in 522 errors, both for domains proxied through Cloudflare, as unproxied domains.
Most of these sites get little traffic, with the most visited ones probably topping out at 10.000 visitors per month. Most of them are cached through Cloudflare, on top of WordPress caching, NGINX, Redis...

I have gone through many troubleshooting steps, but still haven't found a fix. What we already did so far:
- Verify CPU usage
- Verify RAM usage
- Verified our firewall rules, turned off firewall temporarily to rule out it as the cause of the issue
- Used the built-in diagnose & repair tool. No issues found to be fixed.
- Turned off Cloudflare proxying to rule out Cloudflare: does not change website performance.
- Stress tested our MariaDB server, which can handle 10000 queries coming from 200 concurrent connections before reaching 80% CPU usage. In general we have maximum 20 concurrent connections to the DB server
- Scanned the entire server for malware. One client's website was recently infected and we solved this issue
- Disabled domains that were meant for development purposes, or no longer needed
- Increased cache size for MySQL as well as max open files.
- Restarted both our server and Plesk itself multiple times

On our websites we use:
- WP Fastest Cache in WordPress itself
- Redis caching via docker containers
- NGINX caching
- Most are proxied through Cloudflare with caching rules, some aren't. Issues happen on both types of websites.

Websites that are completely cached still load fairly quickly in general, but also experience 522 errors and slowdowns on an irregular basis. Websites that need to make database calls (webshops for example) go from working reasonably well to taking up to 2 minutes to load a page or not loading at all. And all of this is inconsistent.

There are no apparent errors in Plesk logs that I have found, WordPress or the general domain logs. So at the moment I'm banging my head against a wall and trying to keep our clients updated on the situation.

The one odd thing we keep seeing are 2/3 concurrent database processes with very high TIME_MS happening in the 'psa' database ran by user 'admin'. I believe these are Plesk commands running, but am not sure if this is normal behaviour.

If anyone has any insights in what might be causing this high latency/522 error issue, I would love to hear so! Thanks in advance, and apologies for the longwinded post.

If any logs or extra information on our server are needed to help debugging, I will be happy to provide them.

hey Bjorn thanks for the post !

have you been able to pin point what it was?

I seem to have an issue now with one of Plesk servers and sort of similar configuration (same version) trying to find out is that bots or legitimate load (but not that many domains yet) or what else could it be ...

BjornTheBassist · Nov 25, 2024

realaaa@ said:
hey Bjorn thanks for the post !

have you been able to pin point what it was?

I seem to have an issue now with one of Plesk servers and sort of similar configuration (same version) trying to find out is that bots or legitimate load (but not that many domains yet) or what else could it be ...

Sadly we never found the source of the problem. After about 2 weeks the server went back to normal by itself. And now recently, 4 days ago, we are running into the same issue again.

Together with both Hostinger's support & Plesk's support we did many smaller tweaks to Plesk's settings, like even stopping Plesk from grabbing screenshots of domains etc. But none of them helped as CPU load isn't our problem.

We have the issue with all sorts of DNS: proxied through Cloudflare, non-proxied but nameservers at Cloudflare, nameservers at 2 different registrars. So that rules out a DNS issue too.

We moved our most crucial domains, with the biggest resource & traffic usage, to a different host. That didn't clear the issue, but at least those are fine now.
Compared to before the server is almost empty, and only hosts very lightweight websites. So it's strange to see that we're experiencing issues again.

On our end I do believe it's a problem with the actual server or data center, and that Hostinger is simply witholding information.
When the 522 timeout errors occur there are no logs to be found in Plesk about this connection.

Plesk support was great in trying to pinpoint the root cause of our problems, as well as running through many optimisations with us, but I understand there's only so much they can do.

I wish you all the luck in troubleshooting your issue. Our solution will sadly be to move away from our current VPS, and maybe Plesk as well. It simply isn't worth this headache & loss of time.

HHawk · Nov 26, 2024

BjornTheBassist said:
Sadly we never found the source of the problem. After about 2 weeks the server went back to normal by itself. And now recently, 4 days ago, we are running into the same issue again.

Together with both Hostinger's support & Plesk's support we did many smaller tweaks to Plesk's settings, like even stopping Plesk from grabbing screenshots of domains etc. But none of them helped as CPU load isn't our problem.

We have the issue with all sorts of DNS: proxied through Cloudflare, non-proxied but nameservers at Cloudflare, nameservers at 2 different registrars. So that rules out a DNS issue too.

We moved our most crucial domains, with the biggest resource & traffic usage, to a different host. That didn't clear the issue, but at least those are fine now.
Compared to before the server is almost empty, and only hosts very lightweight websites. So it's strange to see that we're experiencing issues again.

On our end I do believe it's a problem with the actual server or data center, and that Hostinger is simply witholding information.
When the 522 timeout errors occur there are no logs to be found in Plesk about this connection.

Plesk support was great in trying to pinpoint the root cause of our problems, as well as running through many optimisations with us, but I understand there's only so much they can do.

I wish you all the luck in troubleshooting your issue. Our solution will sadly be to move away from our current VPS, and maybe Plesk as well. It simply isn't worth this headache & loss of time.

@BjornTheBassist, I can completely understand your frustration! We’ve faced a remarkably similar situation with a Plesk server we manage for one of our customers. While this incident dates back a bit, the experience was strikingly the same—and unfortunately, equally exasperating.

Our ordeal began when we noticed severe performance issues with Plesk. The interface was slow to the point of being unusable at times. Naturally, we opened a support ticket. After waiting several hours for any response, we finally received follow-up… only to discover the resolution process would be no less painful than the problem itself.

A support technician briefly logged in (and by “briefly,” I mean roughly 20 minutes) and concluded that everything was perfectly fine. Their message: “There’s nothing wrong. The system is functioning as expected.” And that was that—case closed.

This, of course, was far from acceptable. Our customer was paying nearly €500 annually for an interface that, at best, performed like a sluggish open-source tool. If this had been a €200 solution, we could perhaps excuse occasional hiccups. But for this price tag? Unacceptable.

Frustrated, I escalated the issue, sending a firm follow-up to demand a second look. Eventually, after considerable back-and-forth, the ticket was reviewed by more “skilled” technicians. Another day passed. Another round of tests was conducted. The result? Nothing. Plesk maintained that everything was working flawlessly. And then came the pièce de résistance—they blamed our network!

I’ll admit, I had to laugh. Imagine, after hours of delays, their conclusion boiled down to, “It’s not us; it’s you!” Classic.

Obviously, this wasn’t something we could explain away to our server customer. We took matters into our own hands, offering to migrate the customer to a different panel—one known for reliability and better performance. The outcome? The customer was thrilled. The new interface worked flawlessly, with none of the lag, errors, or resource issues that plagued Plesk. Fast forward seven or eight months, and this same customer has since ordered two more servers with us, using the same alternative interface. Talk about a happy ending.

Sometimes, you just have to recognize when you’re flogging a dead horse. Plesk, for all its history, seems to be galloping downhill at an alarming pace. Add to this the recent announcement of extreme price hikes for January 2025, and it’s hard not to feel like the ship is sinking.

At this rate, Plesk’s license sales could drop by 30-50%, but by the time they notice, it may be too late. Loyal partners and customers are already jumping ship, and for good reason. The interface has become an unwieldy mess:

Actions that should be instantaneous take several seconds—or worse, nearly a full minute.
Random errors pop up like unwelcome guests.
The resource consumption is increasingly absurd.
And don’t even get me started on the relentless UI changes that nobody asked for.

Perhaps it’s time for Plesk to take a step back and return to its roots. Focus on core functionality. Streamline the interface. Drop the unnecessary “extras” that add no real value. After all, an admin panel is supposed to simplify server management, not frustrate it.

For now, we’ll continue to monitor the situation. But if 2025’s price increases are any indication, it might just be time to jump ship entirely.

WebHostingAce · Nov 29, 2024

@BjornTheBassist, I completely understand the frustration.

Have you considered the possibility of a Noisy Neighbor?

I had a similar experience due to a Noisy Neighbor. I couldn't find any issues with my VM—everything appeared normal in terms of CPU and memory usage—but sometimes the websites would take a long time to respond. When I informed the hosting provider, they were able to live migrate my VM to a different hardware node, which resolved the issue.

There was another instance with a different provider where my VM would randomly hang (after weeks or months), requiring a reboot from the provider’s control panel. When I reported the issue, the hosting provider confirmed that they were experiencing a memory issue with their hypervisor, and their engineers were working on it.

BjornTheBassist · Dec 2, 2024

HHawk said:
@BjornTheBassist, I can completely understand your frustration! We’ve faced a remarkably similar situation with a Plesk server we manage for one of our customers. While this incident dates back a bit, the experience was strikingly the same—and unfortunately, equally exasperating.

Our ordeal began when we noticed severe performance issues with Plesk. The interface was slow to the point of being unusable at times. Naturally, we opened a support ticket. After waiting several hours for any response, we finally received follow-up… only to discover the resolution process would be no less painful than the problem itself.

A support technician briefly logged in (and by “briefly,” I mean roughly 20 minutes) and concluded that everything was perfectly fine. Their message: “There’s nothing wrong. The system is functioning as expected.” And that was that—case closed.

This, of course, was far from acceptable. Our customer was paying nearly €500 annually for an interface that, at best, performed like a sluggish open-source tool. If this had been a €200 solution, we could perhaps excuse occasional hiccups. But for this price tag? Unacceptable.

Frustrated, I escalated the issue, sending a firm follow-up to demand a second look. Eventually, after considerable back-and-forth, the ticket was reviewed by more “skilled” technicians. Another day passed. Another round of tests was conducted. The result? Nothing. Plesk maintained that everything was working flawlessly. And then came the pièce de résistance—they blamed our network!

I’ll admit, I had to laugh. Imagine, after hours of delays, their conclusion boiled down to, “It’s not us; it’s you!” Classic.

Obviously, this wasn’t something we could explain away to our server customer. We took matters into our own hands, offering to migrate the customer to a different panel—one known for reliability and better performance. The outcome? The customer was thrilled. The new interface worked flawlessly, with none of the lag, errors, or resource issues that plagued Plesk. Fast forward seven or eight months, and this same customer has since ordered two more servers with us, using the same alternative interface. Talk about a happy ending.

Sometimes, you just have to recognize when you’re flogging a dead horse. Plesk, for all its history, seems to be galloping downhill at an alarming pace. Add to this the recent announcement of extreme price hikes for January 2025, and it’s hard not to feel like the ship is sinking.

At this rate, Plesk’s license sales could drop by 30-50%, but by the time they notice, it may be too late. Loyal partners and customers are already jumping ship, and for good reason. The interface has become an unwieldy mess:

Actions that should be instantaneous take several seconds—or worse, nearly a full minute.

Random errors pop up like unwelcome guests.

The resource consumption is increasingly absurd.

And don’t even get me started on the relentless UI changes that nobody asked for.

Perhaps it’s time for Plesk to take a step back and return to its roots. Focus on core functionality. Streamline the interface. Drop the unnecessary “extras” that add no real value. After all, an admin panel is supposed to simplify server management, not frustrate it.

For now, we’ll continue to monitor the situation. But if 2025’s price increases are any indication, it might just be time to jump ship entirely.

Thanks for your writeup @HHawk, glad to know I'm not simply going crazy and imagining things.

Seeing the price hikes that are coming next year, and the fact that Plesk isn't very reliable for us, we will probably be migrating to another panel.

BjornTheBassist · Dec 2, 2024

WebHostingAce said:
@BjornTheBassist, I completely understand the frustration.

Have you considered the possibility of a Noisy Neighbor?

I had a similar experience due to a Noisy Neighbor. I couldn't find any issues with my VM—everything appeared normal in terms of CPU and memory usage—but sometimes the websites would take a long time to respond. When I informed the hosting provider, they were able to live migrate my VM to a different hardware node, which resolved the issue.

There was another instance with a different provider where my VM would randomly hang (after weeks or months), requiring a reboot from the provider’s control panel. When I reported the issue, the hosting provider confirmed that they were experiencing a memory issue with their hypervisor, and their engineers were working on it.

We went that route as well. I've contacted Hostinger multiple times to request help in troubleshooting the issue. But all they ever came up with was "Your server is well within it's limits, there are no issues". Then again, perhaps that's my mistake for choosing Hostinger to host our VPS.

HHawk · Dec 2, 2024

realaaa@ said:
hey Bjorn thanks for the post !

have you been able to pin point what it was?

I seem to have an issue now with one of Plesk servers and sort of similar configuration (same version) trying to find out is that bots or legitimate load (but not that many domains yet) or what else could it be ...

It seems this reply was inadvertently skipped, but certainly not intentionally. Over the past few months, we've observed a noticeable increase in bot activity—primarily brute-force attempts and DDoS/overload attacks—targeting all our Plesk servers. Interestingly, our DirectAdmin servers remain unaffected. While this issue has always existed to some extent, the frequency and intensity have been escalating, particularly since late February or early March 2024.

Approximately 90% of these incidents originate from Amazon (AWS) or Microsoft (Azure) servers. Initially, we addressed the problem by null-routing individual IP addresses. However, the attacks would quickly resume, often within two minutes, using a different IP from the same range. As a result, we've adopted a stricter approach: if we detect an overload or any unusual activity lasting longer than 10 minutes, we permanently null-route the entire range.

This strategy has significantly reduced the frequency of overloads, though they still occur sporadically. The improvement in performance has been noticeable—less strain on the servers translates to happier customers. While blocking Amazon and Microsoft ranges is generally safe, we’ve encountered only about six complaints over several months. The trade-off has been worth it for the improved stability and customer satisfaction.

That said, much of this effort might now be moot, as we’ve decided to scale down our use of Plesk licenses significantly. Maintaining Plesk at its current level has become prohibitively expensive for both us and our clients. We anticipate retaining only 15–20% of our existing licenses. It’s bittersweet, as we’ve been using Plesk since its early days under SWsoft in 1999—a time when the hosting industry was exciting and rewarding. Unfortunately, the landscape has changed dramatically, and those golden days feel like a distant memory.

Issue High latency, unreliable performance & 522 errors on a Plesk server that ran fine for 6+ months

BjornTheBassist

New Pleskian

Sebahat.hadzhi

Community Manager

realaaa@

Basic Pleskian

BjornTheBassist

New Pleskian

HHawk

Regular Pleskian

WebHostingAce

Silver Pleskian

BjornTheBassist

New Pleskian

BjornTheBassist

New Pleskian

HHawk

Regular Pleskian

Similar threads