Named service stop to work randomly

PeopleInside · May 17, 2023

Username:

TITLE

Named service stop to work randomly

PRODUCT, VERSION, OPERATING SYSTEM, ARCHITECTURE

Ubuntu 22.04.2 LTS
Plesk Obsidian Version 18.0.51
6 vCore RAM: 8 GB SSD: 160 GB

PROBLEM DESCRIPTION

This issue is not happening only to me and is happening also on more powerful server with 32 GB of RAM as reported here.

I'm facing this issue since a year, it doesn't happen often but when it does cause downtime on more sequential days and need a manual action to restart the VPS from my VPS hosting IONOS.

The issue:
The service named stop randomly to work.
I host my DNS on CloudFlare, I don't know what named service is but I can read is about BIND if I'm not wrong and this is related to the DNS management, if I'm not wrong.

I reported my downtime issue here: How to understand why server went down?
As you can see I don't have no RAM and CPU overload issues.

but checking /var/log/syslog I can found the following logs:

Code:

May 16 18:05:45 peo named[976]: no longer listening on (MY VPS IP)#53
May 16 18:06:05 peo plesk_saslauthd[97254]: select timeout, exiting

The downtime has started as soon the named service no longer listening so I get down email alert at 18:07 after 3 minutes of downtime.

Seems this issue is not just mine, has been discussed here, the topic is marked as solved but I see the issue has return back sometimes to users and I'm experiencing the issue too.

I never found a solution that works to resolve this issue, can the team maybe look into this and found maybe some issue or solution?

Could you maybe suggest a script to check the named service and restart automatically if this service is not listening? This will avoid me to have long downtime that I cannot resolve if I'm not at home or in the office in front a PC that can login in my hosting panel to restart my VPS. I'm not a developer and I don't know hot to create the file and what code I need.

This issue create downtime since a year or more, is always an issue with named service also on different VPS with Plesk.

I hope in the future your software can avoid this issue.
Thanks

STEPS TO REPRODUCE

It's a randoom issue so I have not any step to provide, sorry for this.
I know steps to reproduce are important!

ACTUAL RESULT

Randomly named service stop to listening and cause all service email and web to be down.

EXPECTED RESULT

Named service should not stop listening.
I don't have no RAM and CPU overload issues so why this service should turn my server down?
This should not happen. I expect a way to restart automatically this service if went down again and for sure will be.

ANY ADDITIONAL INFORMATION

Could you please help me with a script that resolve the issue of the log:
May 16 18:05:45 peo named[976]: no longer listening on (MY VPS IP)#53

Seems the fix will be check if the named stop to listening then if so restart the service #service named-chroot restart but I need do this with a cronjob if you don't know why this service create issue not only to me.

YOUR EXPECTATIONS FROM PLESK SERVICE TEAM

Help with sorting out

PeopleInside · May 17, 2023

Today new downtime of two minutes at 07:13 AM

helpdesk peopleinside.it (Recent History) - Powered by HetrixTools

helpdesk peopleinside.it - Uptime Report

stato.peopleinside.it

The log say:
PrivateBin

Any idea how can log means?
PassengerAgent invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

Peter Debik · May 17, 2023

Thank you for your report, however, as there are not steps to reproduce the issue and previous discussions suggested that it is an issue with Ionos VPS could you please instead open a Plesk support ticket for this issue? It needs to be checked on your server as that will be the only way to find the cause.

PeopleInside · May 17, 2023

Currently I have no other money to pay a support subscription so probably this issue will be not resolved.
Any help about creating a script that restart the service is the service is broken?

PeopleInside · May 17, 2023

I also don't think the issue is related to IONOS when the issue come from a service, maybe managed by Plesk (is maybe the Bind component) and IONOS just provide the virtual VPS. There is no evidence of the issue by the service provided by IONOS, the issue seems come from a server service that is managed maybe by Plesk and also this issue affected also other users not just me, as showed in the discussions.

Peter Debik · May 17, 2023

On my CentOS I recently needed a script that detects if Nginx failed and restarts it. Assuming this could be adapted, e.g.

Code:

#!/bin/bash

string1=`/usr/bin/systemctl is-failed named`
string2="failed"

if [ "$string1" = "$string2" ]; then
   /usr/bin/systemctl start named
fi

and then run as a cronjob every minute. I am not sure if the same algorithm also works for Ubuntu/Debian, but in case it does not, probably something similar can be done there.

PeopleInside · May 17, 2023

Thanks ... I will need to try this, seems my only solution right now.

AYamshanov · May 17, 2023

Hi everyone,

I have checked logs and links, I do not understand why do you think there is an issue with "named", let me provide some details why I think so.

"Randomly named service stop to listening and cause all service email and web to be down."
May 16 18:05:45 peo named[976]: no longer listening on (MY VPS IP)#53

Previously, I saw similar question but on my server I see the same message during "named" process restart. So, the message should not be decided as a marker of some critical error.

I host my DNS on CloudFlare

So, it means if even "named" is not working, visitors from the Internet will continue receive valid DNS records and continue be able to open websites.

---

Big thank you for providing links to monitoring and logs, that is really helpful!

May 17 07:14:55 peopleinside kernel: [45696.837853] PassengerAgent invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[...]
May 17 07:14:55 peopleinside kernel: [45696.838614] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[...]
May 17 07:14:55 peopleinside kernel: [45696.839595] [ 88409] 998 88409 3160747 1728240 15851520 227908 0 sw-engine
[...]
May 17 07:14:55 peopleinside kernel: [45696.839757] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=nginx.service,mems_allowed=0,global_oom,task_memcg=/system.slice/cron.service,task=sw-engine,pid=88409,uid=998
May 17 07:14:55 peopleinside kernel: [45696.839813] Out of memory: Killed process 88409 (sw-engine) total-vm:12642988kB, anon-rss:6909852kB, file-rss:3108kB, shmem-rss:0kB, UID:998 pgtables:15480kB oom_score_adj:0

For me, it looks like sw-engine consumed huge amount of memory and when the memory ran out, the oom-killer came and killed sw-engine. Probably, everything happened so quickly that took only 2 minutes and caused short outage but was not detected by monitoring system (used for creating CPU/Memory graphs). Since it was not detected by monitoring, you do not see 100% usage on graphs.

If the issue persists, I would recommend to contact the Plesk Support team to investigate the issue.

PeopleInside · May 18, 2023

@AYamshanov thanks for the answer.
The last dowtime of two minutes of curse can be caused by the sw-engine consumed huge amount of memory and when the memory ran out, the oom-killer came and killed sw-engine. I believe the same but the dowtime that I have for two days before did not resolve in two minutes but can be without a end if I dont restart the server manually.

If the issue persists, I would recommend to contact the Plesk Support team to investigate the issue.

I need pay a support subscription for this and this is an issue for me. I'm not into business, the VPS it cost a lot for me and also I'm not sure paying and opening a support ticket will understood for sure what happen and made a fix.

PeopleInside · May 18, 2023

I have checked logs and links, I do not understand why do you think there is an issue with "named", let me provide some details why I think so.

This because the long downtime has the log named was not listening in the time where downtime starts and also user are reporting the same:

Resolved - named service stop randomly

Hi, The named service stop to work randomly in my server. How to fix this? I was able to fix with: #service named-chroot restart My customers can not login in their wordpress dashboards when bind service is not working. I am under plesk 18.0.34 update nº2

talk.plesk.com

The named service stop to work randomly in my server. How to fix this?
I was able to fix with:
#service named-chroot restart

Thanks anyway for the reply

I appreciate that

PeopleInside · May 30, 2023

Hi,
today I got a new downtime that require to restart the server from IONOS panel.
From the Plesk log I see:

Could be this errors the cause of the dowtime started at 12:14?

PeopleInside · May 30, 2023

Code:

May 30 12:05:47 peopleinside named[1006]: no longer listening on 82.XXX.77.XX#53

I think this is the log that cause the downtime.
Seems I am unable to find how to resolve this issue that happen often

PeopleInside · May 30, 2023

Code:

May 30 12:05:47 peopleinside named[1006]: no longer listening on 82.XXX.77.57#53

Issue with downtime happen again,
seems create a script with a code that check if named service are running never works.

Code:

#!/bin/bash

string1=`/usr/bin/systemctl is-failed named`
string2="failed"

if [ "$string1" = "$string2" ]; then
           /usr/bin/systemctl start named
           mail -s "Named service restarted - CronJob" [email protected] <<< "Named service was restarted by sh custom check"
fi

Also if i create this script with a CronJob never resolve the issue.
I had again maed service not listening and seems the cronjob never found the named service to restart

Could I have a suggestion from the team to what can I do to path this named service that often generate the log issue of no more listening and get my server down?

Maarten · May 30, 2023

Could you please contact Plesk Support? It's free for the first month:

https://support.plesk.com/hc/en-us/articles/12388090147095-How-to-get-support-directly-from-Plesk-

PeopleInside · May 30, 2023

Maarten. said:
Could you please contact Plesk Support? It's free for the first month:

https://support.plesk.com/hc/en-us/articles/12388090147095-How-to-get-support-directly-from-Plesk-

Consider my free month ended so I cant.

I'm trying to understand why this named service create issue and how I could solve it. Looks to be something inside Plesk.

Peter Debik · May 30, 2023

PeopleInside said:
Looks to be something inside Plesk.

Not so likely, because in that case tens of thousands of other server operators would experience the same, but they are not. The case needs to be investigated on this specific installation.

The "is-failed" of the above quoted script only applies, if the service has the "failed" status, but it does not apply if it has the "deactivated", "activating" status or something else. Instead of my example from above (it was only an example!) you should try "/sbin/service named restart" on Ubuntu or "/usr/bin/systemctl start bind9", whichever fits your situation best. You can run that on the command line to find out what it does and to make sure that the commands you use inside a script work. My example from above was a general illustration how to tackle the situation.

Maarten · May 30, 2023

I just found this page:

(Plesk for Linux) Automatic Restart of Crashed Services with Systemd

summary

docs.plesk.com

It gives you the option to automatically restart the named service after a crash.

The list of services and their status can be listed with this command:

Code:

# /usr/local/psa/admin/sbin/register_service --full-list

Peter Debik · May 30, 2023

Oh cool, I should have known this. Thank you @Maarten

PeopleInside · May 30, 2023

Maarten. said:
I just found this page:

(Plesk for Linux) Automatic Restart of Crashed Services with Systemd

summary

docs.plesk.com

It gives you the option to automatically restart the named service after a crash.

The list of services and their status can be listed with this command:

Code:

# /usr/local/psa/admin/sbin/register_service --full-list

Thank you!
From the guide I understand I need some code in the /usr/local/psa/admin/conf/panel.ini

The strings that I need to add can be like this:

Code:

[named]
Service.RestartSec = 7
Service.Restart = always

Maarten · May 30, 2023

I gave it a try, but it doesn't work. Could this be a legacy function from a previous Plesk version?

@Peter Debik, can you ask the developers if this should work?

Named service stop to work randomly

Regular Pleskian

Regular Pleskian

Community Manager until 3/2024

Regular Pleskian

Regular Pleskian

Community Manager until 3/2024

Regular Pleskian

Golden Pleskian

Regular Pleskian

Regular Pleskian

Regular Pleskian

Regular Pleskian

Regular Pleskian

Golden Pleskian

Regular Pleskian

Community Manager until 3/2024

Golden Pleskian

Community Manager until 3/2024

Regular Pleskian

Golden Pleskian

Similar threads