Nightly DNS Service Failures?

Eric Pretorious · Dec 16, 2013

Since we've begun monitoring DNS service on our Plesk Panel server....

Code:

OS:	CentOS 6.5 (Final)
Panel version:	11.0.9 Update #60

we've noticed nightly DNS failures. Looking in /var/log/messages, we can see that BIND is reloading repeatedly at the same time that the outage occurs:

Code:

[root@www log]# grep 'received SIGHUP signal to reload zones' messages
Dec 15 07:21:27 www named[2275]: received SIGHUP signal to reload zones
Dec 15 07:22:39 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:06:09 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:07:37 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:07:45 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:07:51 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:07:56 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:08:02 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:08:08 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:08:13 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:08:18 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:08:23 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:36:16 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:36:17 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:37:31 www named[2275]: received SIGHUP signal to reload zones
Dec 15 08:37:32 www named[2275]: received SIGHUP signal to reload zones

At first, we thought that this was caused by the system-wide backup that was also scheduled to occur at 07:50. But then we canceled the "Suspend domains until backup task is completed" option and changed the backup to 09:00 and the error messages continued to occur at 07:21, 08:06, & 8:36 every morning!

What's causing BIND to reload repeatedly every day at 07:21, 08:06, & 8:36?

IgorG · Dec 17, 2013

Eric Pretorious said:
What's causing BIND to reload repeatedly every day at 07:21, 08:06, & 8:36?

Who knows? Check your scheduled tasks. Check resources if it is VPS. It is administrative issue but not Plesk related.

Eric Pretorious · Dec 17, 2013

IgorG said:
Who knows? Check your scheduled tasks. Check resources if it is VPS. It is administrative issue but not Plesk related.

I'm not so sure about that, Igor: looking in /var/log/cron there seems to be a very strong correlation between the backupmng cron job and BIND reloading at 07:21, 08:06, & 8:36 every morning:

Code:

[root@www log]# grep '07:21' cron
Dec 15 07:21:01 www CROND[11577]: (root) CMD ([ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1)
Dec 15 07:21:42 www crontab[12030]: (root) LIST (foo)
Dec 16 07:21:01 www CROND[9565]: (root) CMD ([ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1)
Dec 16 07:21:22 www crontab[9958]: (root) LIST (foo)
Dec 17 07:21:01 www CROND[32007]: (root) CMD ([ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1)
Dec 17 07:21:37 www crontab[32403]: (root) LIST (foo)

[root@www log]# grep '08:06' cron
Dec 15 08:06:01 www CROND[16756]:   (root) CMD ([ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1)
Dec 15 08:06:44 www crontab[17232]: (root) LIST (bar)
Dec 16 08:06:01 www CROND[11949]:   (root) CMD ([ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1)
Dec 16 08:06:44 www crontab[12366]: (root) LIST (bar)
Dec 17 08:06:01 www CROND[1494]:    (root) CMD ([ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1)

[root@www log]# grep '08:36' cron
Dec 15 08:36:01 www CROND[21542]:   (root) CMD ([ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1)
Dec 15 08:36:23 www crontab[21972]: (root) LIST (bletch)
Dec 16 08:36:01 www CROND[13920]:   (root) CMD ([ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1)
Dec 16 08:36:15 www crontab[14310]: (root) LIST (bletch)
Dec 17 08:36:01 www CROND[3593]:    (root) CMD ([ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1)
Dec 17 08:36:15 www crontab[3983]:  (root) LIST (bletch)

All three of the users' crontabs (i.e., foo, bar, & bletch) are all empty (i.e., they contain only the declaration of the environment variable "SHELL=/usr/local/psa/bin/chrootsh").

Thoughts? Ideas?

srvadm · Apr 8, 2014

I guess you alreay solved the problem, can you share the solution if it's the case?

I have a similar problem, most of the times (March: 15, 22, 29 and April 5) BIND reloads occur coinciding with a weekly backup (All configuration and content) and Suspend domain until backup task is completed option enabled. I changed the Backup Settings, waiting to check this weekend

messages-20140316:Mar 15 08:10:07 s62591307 named[4731]: received SIGHUP signal to reload zones
messages-20140316:Mar 15 08:14:16 s62591307 named[4731]: received SIGHUP signal to reload zones
messages-20140323:Mar 22 08:10:08 s62591307 named[4731]: received SIGHUP signal to reload zones
messages-20140323:Mar 22 08:14:21 s62591307 named[4731]: received SIGHUP signal to reload zones
messages-20140330:Mar 29 08:10:06 s62591307 named[4731]: received SIGHUP signal to reload zones
messages-20140330:Mar 29 08:14:17 s62591307 named[4731]: received SIGHUP signal to reload zones
messages-20140406:Apr 1 21:44:57 s62591307 named[4731]: received SIGHUP signal to reload zones
messages-20140406:Apr 3 22:55:08 s62591307 named[4731]: received SIGHUP signal to reload zones
messages-20140406:Apr 3 22:56:44 s62591307 named[4731]: received SIGHUP signal to reload zones
messages-20140406:Apr 5 08:10:07 s62591307 named[4731]: received SIGHUP signal to reload zones
messages-20140406:Apr 5 08:14:08 s62591307 named[4731]: received SIGHUP signal to reload zones 4731

Eric Pretorious · Apr 9, 2014

srvadm said:
I guess you alreay solved the problem, can you share the solution if it's the case?

No. We weren't able to determine the root cause and have just chalked it up to the Plesk backups.

Sorry!

Dmitriy S · Apr 28, 2014

Check if you have the following errors in backup logs:

Code:

grep 'may be inaccessible after backup' /usr/local/psa/PMM/sessions/2014-04-27-234302.475/migration.result
            <description>The domain 'domain1.com' may be inaccessible after backup. Please, resume it manually!</description>
            <description>The domain 'domain2.com' may be inaccessible after backup. Please, resume it manually!</description>
            <description>The domain 'domain3.com' may be inaccessible after backup. Please, resume it manually!</description>
            <description>The domain 'domain4.com' may be inaccessible after backup. Please, resume it manually!</description>

This is the result of confirmed bug PPPM-759. Domain with alias cannot be successfully unsuspended if it has enabled greylisting.

Fix the issue with:

Code:

mysql> select * from dom_param where param='gl_filter' AND dom_id in (select id from domains where name in ('domain1.com','domain2.com','domain3.com','domain4.com','domain5.net'));
+--------+-----------+------+

| dom_id | param     | val  |
+--------+-----------+------+

|     17 | gl_filter | on   |
|     18 | gl_filter | off  |
|     19 | gl_filter | off  |
|     26 | gl_filter | off  |
|     46 | gl_filter | on   |
+--------+-----------+------+

mysql> update dom_param set val='on' where param='gl_filter' AND dom_id in (select id from domains where name in ('domain1.com','domain2.com','domain3.com','domain4.com','domain5.net'));
mysql> select * from dom_param where param='gl_filter' AND dom_id in (select id from domains where name in ('domain1.com','domain2.com','domain3.com','domain4.com','domain5.net'));
+--------+-----------+------+

| dom_id | param     | val  |
+--------+-----------+------+

|     17 | gl_filter | on   |
|     18 | gl_filter | on   |
|     19 | gl_filter | on   |
|     26 | gl_filter | on   |
|     46 | gl_filter | on   |
+--------+-----------+------+

Nightly DNS Service Failures?

Eric Pretorious

Regular Pleskian

IgorG

Plesk addicted!

Eric Pretorious

Regular Pleskian

srvadm

Basic Pleskian

Eric Pretorious

Regular Pleskian

Dmitriy S

Basic Pleskian

Similar threads