Issue Network drops every 12 hours [IONOS]

n0fear · Jan 20, 2024

Hi, after having solved this problem some time ago, i have a similar problem, not every 12 hours but after some days my new server from IONOS with ubuntu 22.04. is completly away, no ssh access not plesk. Only solution is to reboot via ionos cloud panel. it is a dedicated server. problem is i allready tryed 3 servers of this kind Ryzen 9 Pro 3900, 128 GB Ram, NVME and Ubunut 22.04 with plesk. Alle had the same problem.... works like 7-10 Days then even when there is no load at all, putty closes (lost connection) and like disconnected from ethernet... after one reboot everything works again for some days .... i cant find anything in the logs. Anyone else maybe with a similar problem?

thinkjarvis · Jan 22, 2024

n0fear said:
Hi, after having solved this problem some time ago, i have a similar problem, not every 12 hours but after some days my new server from IONOS with ubuntu 22.04. is completly away, no ssh access not plesk. Only solution is to reboot via ionos cloud panel. it is a dedicated server. problem is i allready tryed 3 servers of this kind Ryzen 9 Pro 3900, 128 GB Ram, NVME and Ubunut 22.04 with plesk. Alle had the same problem.... works like 7-10 Days then even when there is no load at all, putty closes (lost connection) and like disconnected from ethernet... after one reboot everything works again for some days .... i cant find anything in the logs. Anyone else maybe with a similar problem?

Martin in IONOS second line support had suggested changing from DCHP to a static IP address setup for the server to prevent the DCHP connection problem occuring.

We have not made this change yet because of the solution below!

However I believe I found the cause of the problem and a solution!
Our IONOS server with the same spec stopped crashing after we identified a run-away php process on the clients wordpress website itself. The wishlist plugin they were using was causing WooCommerce Cart Fragments to refresh thousands of times a minute - per visitor. The server was coping with this weird spike - JUST - but by fixing the problem then disabling woocommerce cart fragments the server has been running with stability for over 2 weeks without any crashes. CPU useage has been stable and below 25%.

Edit - The wishlist plugin used was clashing with WooCommerce cart refresh. No idea why. We had to change a line of code in the wishlist plugin to do this.

Check your website log files for runaway processes and consider setting the pm.max_requests (In PHP settings) to something like 100 to try and prevent processes running away.

If the issue is indeed resolved then the cause for us was actually not a server problem but related to the website. Perhaps the cart refresh process was using up so much cpu time that Plesks processes couldnt run on time causing the IP addresses not to resolve.

It was almost impossible to diagnose this problem until we had ruled everything else out.

If it crashes again I'll post here.

Kaspar · Jan 22, 2024

@thinkjarvis thank you for sharing what you've found so far. Out of curiosity what steps did you take to find that the WooCommerce plugin was causing high CPU load.

thinkjarvis · Jan 22, 2024

Kaspar said:
@thinkjarvis thank you for sharing what you've found so far. Out of curiosity what steps did you take to find that the WooCommerce plugin was causing high CPU load.

No problem - Checked the log browser for the domain and noticed thousands of calls per second/minute for ajax_cart_refresh.

We had to comment out a line in the wishlist plugin. As I manage the hosting of this site their internal developer commented out the line.

We then used a typical code snippet to disable ajax cart refresh in woocommerce fully.

thinkjarvis · Jan 22, 2024

You can also check the Monitoring panel in Plesk to see if you have any spikes.
With Grafana installed you need to look at the CPU, Ramp Usage and then also the Services tab > PHP FPM memory usage for spikes.

thinkjarvis · Jan 22, 2024

thinkjarvis said:
Martin in IONOS second line support had suggested changing from DCHP to a static IP address setup for the server to prevent the DCHP connection problem occuring.

We have not made this change yet because of the solution below!

However I believe I found the cause of the problem and a solution!
Our IONOS server with the same spec stopped crashing after we identified a run-away php process on the clients wordpress website itself. The wishlist plugin they were using was causing WooCommerce Cart Fragments to refresh thousands of times a minute - per visitor. The server was coping with this weird spike - JUST - but by fixing the problem then disabling woocommerce cart fragments the server has been running with stability for over 2 weeks without any crashes. CPU useage has been stable and below 25%.

Edit - The wishlist plugin used was clashing with WooCommerce cart refresh. No idea why. We had to change a line of code in the wishlist plugin to do this.

Check your website log files for runaway processes and consider setting the pm.max_requests (In PHP settings) to something like 100 to try and prevent processes running away.

If the issue is indeed resolved then the cause for us was actually not a server problem but related to the website. Perhaps the cart refresh process was using up so much cpu time that Plesks processes couldnt run on time causing the IP addresses not to resolve.

It was almost impossible to diagnose this problem until we had ruled everything else out.

If it crashes again I'll post here.

To Clarify - The problem presented itself as a server problem - The network dropping but appears to be a website problem that was causing server actions to fail or run too late. Not sure how this would be possible but if the server netowkr settings are stable for another couple of weeks then my network dropouts have indeed been fixed by removing a resource intensive PHP/mysql database process from the website.

n0fear · Jan 25, 2024

thinkjarvis said:
To Clarify - The problem presented itself as a server problem - The network dropping but appears to be a website problem that was causing server actions to fail or run too late. Not sure how this would be possible but if the server netowkr settings are stable for another couple of weeks then my network dropouts have indeed been fixed by removing a resource intensive PHP/mysql database process from the website.

Thanks for your sharings! This is really strange, probolem here is i dont have any spikes, and i even created a new Server AR 12-128 - the same as our productive server that crashes from time to time completly- And even the new Server AR 12-128 or better to say, now 4 that i tried allready, crash - even there is no Website (only Plesk panel) at all running. What is strange is, i had a migration with plesk running every night (as kind of a failover server) to another spare server. worked fine, then suddenly the server crased every night aroudn 3 o clock for 3 nights - where those cron jobs where running... than, i changed it to 1 o clock, BUT did not crash anymore neither on 3 nor on 1.... not it crashed on 6.50 in the morning and on 18.30 in the evening the last 2 times... it is so strange... another server AR 8-64 is running since month without a problem,. same software and os...

n0fear · Jan 25, 2024

thinkjarvis said:
Martin in IONOS second line support had suggested changing from DCHP to a static IP address setup for the server to prevent the DCHP connection problem occuring.

We have not made this change yet because of the solution below!

However I believe I found the cause of the problem and a solution!
Our IONOS server with the same spec stopped crashing after we identified a run-away php process on the clients wordpress website itself. The wishlist plugin they were using was causing WooCommerce Cart Fragments to refresh thousands of times a minute - per visitor. The server was coping with this weird spike - JUST - but by fixing the problem then disabling woocommerce cart fragments the server has been running with stability for over 2 weeks without any crashes. CPU useage has been stable and below 25%.

Edit - The wishlist plugin used was clashing with WooCommerce cart refresh. No idea why. We had to change a line of code in the wishlist plugin to do this.

Check your website log files for runaway processes and consider setting the pm.max_requests (In PHP settings) to something like 100 to try and prevent processes running away.

If the issue is indeed resolved then the cause for us was actually not a server problem but related to the website. Perhaps the cart refresh process was using up so much cpu time that Plesks processes couldnt run on time causing the IP addresses not to resolve.

It was almost impossible to diagnose this problem until we had ruled everything else out.

If it crashes again I'll post here.

How did you get to Martin IONOS second line? I made a ticket (300 chars possible - thats so rofl....) asked for call back and an email reply to another ticket with asked for, "please go to one of that freezed servers, plugin an monitor and tell me what the server really does or shows..." not solved since days now. Called them like 10 times, explained the problems and so on... had much better support in the past there...

thinkjarvis · Jan 25, 2024

n0fear said:
How did you get to Martin IONOS second line? I made a ticket (300 chars possible - thats so rofl....) asked for call back and an email reply to another ticket with asked for, "please go to one of that freezed servers, plugin an monitor and tell me what the server really does or shows..." not solved since days now. Called them like 10 times, explained the problems and so on... had much better support in the past there...

The second line support dont really do phonecalls. I was a bit of an exception because they needed my troubleshooting to try and solve the problem.

The UK cloud support lines are 24 hour but the second line team is 9-5. So ring the usual dedicated cloud support and ask for the problem to be escalated to the second line support.

We actually had 8 weeks downtime on the initial provision of the AR12-128 due to a problem with IONOS default server provision. This was patched and resolved. Luckily a big project so we didnt need the server just yet for live but it put me behind because it had our revised version of the clients site on it!

I'm not sure what else to suggest but make sure you make them aware of the problem - I was told my issues were affecting a handful of other IONOS clients. All we can do is report it and complain so they can investigate further.

The server has now been up for almost a month without becoming inaccessible. Once it gets to Feb I am calling the issue resolved with the client.

When this server fell over around black friday week - We provisioned a second server in germany the 64gb version and this started randomly rebooting itself after 24 hours. It was literally the worst week of disaster recovery I've ever experienced. We then fell back on a VPS because they dont have the same network resolution issues. The VPS couldnt cope with the spikes - This was before we identified the cause. So we have also been through 2x AR12-128s and the 64gb version before the original AR12-128 became stable again.

n0fear · Feb 21, 2024

thinkjarvis said:
The second line support dont really do phonecalls. I was a bit of an exception because they needed my troubleshooting to try and solve the problem.

The UK cloud support lines are 24 hour but the second line team is 9-5. So ring the usual dedicated cloud support and ask for the problem to be escalated to the second line support.

We actually had 8 weeks downtime on the initial provision of the AR12-128 due to a problem with IONOS default server provision. This was patched and resolved. Luckily a big project so we didnt need the server just yet for live but it put me behind because it had our revised version of the clients site on it!

I'm not sure what else to suggest but make sure you make them aware of the problem - I was told my issues were affecting a handful of other IONOS clients. All we can do is report it and complain so they can investigate further.

The server has now been up for almost a month without becoming inaccessible. Once it gets to Feb I am calling the issue resolved with the client.

When this server fell over around black friday week - We provisioned a second server in germany the 64gb version and this started randomly rebooting itself after 24 hours. It was literally the worst week of disaster recovery I've ever experienced. We then fell back on a VPS because they dont have the same network resolution issues. The VPS couldnt cope with the spikes - This was before we identified the cause. So we have also been through 2x AR12-128s and the 64gb version before the original AR12-128 became stable again.

This is strange, me they told they still have a problem with the whole line of AR12-128 - did a Bios update, told me that should resolve it but id did not. Swapping to another server now :-( i guess you dont know what they really did, or?

thinkjarvis · Feb 22, 2024

n0fear said:
This is strange, me they told they still have a problem with the whole line of AR12-128 - did a Bios update, told me that should resolve it but id did not. Swapping to another server now :-( i guess you dont know what they really did, or?

Initially it would lose its network settings when rebooted and would refuse to come back online full stop.

There was an issue with a specific YUM update. I cannot remember the specific module that failed I'd need to look bacck through my emails. They reimaged the server with the fixed ubuntu image and this solved the problem.

When it failed during black friday the network settings suddenly dropped out. So they were reinstated by second line support. Since then the server was forgetting its network settings periodically but this was caused by the ajax cart fragments refresh in WooCommerce and a clash with a wish list plugin causing multiple async calls for the basket. This prevented a server function running on time causing it to think the ports were dead.

We've had no issues since the start of Jan when we identified the site problem and disabled ajax cart refresh in woocommerce and the wish list plugin.

I have auto updates in Plesk disabled so we now do planned reboots out of hours to prevent updates knocking the server over.

However I am about to re-enable auto updates as I have an IONOS EYPC 16/32 128gb ram running without issues since the start of feb with auto updates turned on. Currently serving 100 websites.

n0fear · Feb 24, 2024

thinkjarvis said:
Initially it would lose its network settings when rebooted and would refuse to come back online full stop.

There was an issue with a specific YUM update. I cannot remember the specific module that failed I'd need to look bacck through my emails. They reimaged the server with the fixed ubuntu image and this solved the problem.

When it failed during black friday the network settings suddenly dropped out. So they were reinstated by second line support. Since then the server was forgetting its network settings periodically but this was caused by the ajax cart fragments refresh in WooCommerce and a clash with a wish list plugin causing multiple async calls for the basket. This prevented a server function running on time causing it to think the ports were dead.

We've had no issues since the start of Jan when we identified the site problem and disabled ajax cart refresh in woocommerce and the wish list plugin.

I have auto updates in Plesk disabled so we now do planned reboots out of hours to prevent updates knocking the server over.

However I am about to re-enable auto updates as I have an IONOS EYPC 16/32 128gb ram running without issues since the start of feb with auto updates turned on. Currently serving 100 websites.

Thanks for the reply, well i swapped to a EPYC 16/128 now, too. More expensive but had only a minor problem, with not using etc/network/interfaces as there was a timeout, fixed that with help of google and works fine until now. What drives me crazy is the really bad support over here in germany nowadays, wrote them mailsm openend tickets, mostly not answered or after long time, you always have to call there... well we will see, maybe i will swapp to another provider with all customers in the future.

thinkjarvis · Feb 25, 2024

n0fear said:
Thanks for the reply, well i swapped to a EPYC 16/128 now, too. More expensive but had only a minor problem, with not using etc/network/interfaces as there was a timeout, fixed that with help of google and works fine until now. What drives me crazy is the really bad support over here in germany nowadays, wrote them mailsm openend tickets, mostly not answered or after long time, you always have to call there... well we will see, maybe i will swapp to another provider with all customers in the future.

Yeah the EPYC range are exceptional servers and actually extremely good value for what they are.

Can you share the fix for the network interfaces timeout. This occurs on all of the dedicated servers I have had from IONOS. Be good to get a solution to this!

To prevent the server crashing itself make sure you set limits for the min and max parent and child php processes. This should help stop processes overwhelming the server. See below as an example.

yfain · Aug 10, 2024

We face the same problem. Is there now a fix for the network problem?

We have only EPIC Servers. Only some servers are effected. We do get a notice with the Monitoring Agent, where I can see when the server is not reachable.

Issue Network drops every 12 hours [IONOS]

n0fear

Basic Pleskian

thinkjarvis

Basic Pleskian

Kaspar

API expert

thinkjarvis

Basic Pleskian

thinkjarvis

Basic Pleskian

thinkjarvis

Basic Pleskian

n0fear

Basic Pleskian

n0fear

Basic Pleskian

thinkjarvis

Basic Pleskian

n0fear

Basic Pleskian

thinkjarvis

Basic Pleskian

n0fear

Basic Pleskian

thinkjarvis

Basic Pleskian

yfain

New Pleskian

Similar threads