Issue Server Down Every Morning After Domain Transfer

LuigiMdg · Sep 27, 2023

As you can guess from the title, after transferring the domain and setting the DNS as they are set on Plesk, every morning at 5:00 (3:00 UTC) the server goes down and comes back online only after 1 hour..!
The only DNS that was missing was ipv4.domain.tld, which I have now tried to add, but I highly doubt that was the problem and I expect to get the same error tomorrow morning.
I also updated Plesk but I don't think this will solve the problem either, which is why I find myself here asking for help.
I looked at the /var/log/messages file, analyzing the messages around 3:00, 4:00 and 5:00 I don't notice anything unusual other than the usual messages that appear every hour.
How can I investigate to investigate and resolve the problem?

pleskpanel · Sep 27, 2023

Do you have any backups set to run during that time?

LuigiMdg · Sep 27, 2023

pleskpanel said:
Do you have any backups set to run during that time?

No

Peter Debik · Sep 28, 2023

Is it really DNS that is down or is it one or both of the web servers?

LuigiMdg · Sep 28, 2023

Peter Debik said:
Is it really DNS that is down or is it one or both of the web servers?

I didn't understand the question, I didn't say it was a DNS problem but only that it was the only operation I carried out that day.
The server is one and the domain is one.

Peter Debik · Sep 28, 2023

Could you please describe which service is down? Is it the web server(s)? The name resolution? The whole system? Is it really down, meaning for example that when you check "systemctl status httpd" or "systemctl status nginx" are they inactive? Or is it that the services are running but you cannot access the server or websites from your location?

LuigiMdg · Sep 28, 2023

I don't know, usually at 3/5am I sleep like all ordinary mortals, I get the message from Monitoring360.
Analyzing the httpd logs does not reveal any shutdown, interruption or restart.

Peter Debik · Sep 28, 2023

Without further analysis on the server it won't be possible to determine the cause the the interruption.

LuigiMdg · Sep 28, 2023

If you suggest what can be done, I'll do it, but I certainly can't stay awake from 3 to 5... At most I can set up a cron that executes a command line for me, I have the imagination, but if you just write to me "further analysis" doesn't help much..

Kaspar · Sep 28, 2023

My experience with monitoring360 is that it creates a lot of false alerts for some reason. I would not rely on it to much. Try using another monitoring service for a while too. If both monitoring services detect down-time you then you can be certain there is something wrong with your server (or dns provider).

Maarten. · Sep 28, 2023

Almalinux 9 has no support for the watchdog extension, so if you have services that crash, they won't get restarted.

There is a replacement for this:

(Plesk for Linux) Automatic Restart of Crashed Services with Systemd

On Linux distributions that use the systemd init system (Debian 8 and later, CentOS/RedHat 7 and later, and Ubuntu 18 and later), Plesk instructs systemd to restart certain services if they crash.

docs.plesk.com

I've explained in this post how it works:

Resolved - Alternative for Watchdog on Ubuntu 22.04

Hello Ubuntu 22.04 does not support the Watchdog package from Plesk. Is there an alternive available that monitors services and can restart them?? Thank you Henk

talk.plesk.com

But before you start implementing this, list the current services on your server:
# /usr/local/psa/admin/sbin/register_service --full-list

Next, add/change the following commands to enable the automatic restart function:

Code:

# /usr/local/psa/admin/sbin/register_service --enable plesk-task-manager
# /usr/local/psa/admin/sbin/register_service --enable httpd
# /usr/local/psa/admin/sbin/register_service --enable named-chroot
# /usr/local/psa/admin/sbin/register_service --enable fail2ban
# /usr/local/psa/admin/sbin/register_service --enable postfix
# /usr/local/psa/admin/sbin/register_service --enable dovecot
# /usr/local/psa/admin/sbin/register_service --enable sw-engine
# /usr/local/psa/admin/sbin/register_service --enable sw-cp-server
# /usr/local/psa/admin/sbin/register_service --enable psa
# /usr/local/psa/admin/sbin/register_service --enable nginx
# /usr/local/psa/admin/sbin/register_service --enable sw-collectd
# /usr/local/psa/admin/sbin/register_service --enable spamassassin

This will restart services that, for some reason, crash. I'm not saying this will solve your problems, but at least you have an alternative for the Watchdog extension.

LuigiMdg · Sep 28, 2023

Kaspar said:
My experience with monitoring360 is that it creates a lot of false alerts for some reason. I would not rely on it to much. Try using another monitoring service for a while too. If both monitoring services detect down-time you then you can be certain there is something wrong with your server (or dns provider).

I also use WhatchDog which sends me the same alert at the same times.

Maarten. said:
Almalinux 9 has no support for the watchdog extension, so if you have services that crash, they won't get restarted.

There is a replacement for this:

(Plesk for Linux) Automatic Restart of Crashed Services with Systemd

On Linux distributions that use the systemd init system (Debian 8 and later, CentOS/RedHat 7 and later, and Ubuntu 18 and later), Plesk instructs systemd to restart certain services if they crash.

docs.plesk.com

I've explained in this post how it works:

Resolved - Alternative for Watchdog on Ubuntu 22.04

Hello Ubuntu 22.04 does not support the Watchdog package from Plesk. Is there an alternive available that monitors services and can restart them?? Thank you Henk

talk.plesk.com

But before you start implementing this, list the current services on your server:
# /usr/local/psa/admin/sbin/register_service --full-list

Next, add/change the following commands to enable the automatic restart function:

Code:

# /usr/local/psa/admin/sbin/register_service --enable plesk-task-manager # /usr/local/psa/admin/sbin/register_service --enable httpd # /usr/local/psa/admin/sbin/register_service --enable named-chroot # /usr/local/psa/admin/sbin/register_service --enable fail2ban # /usr/local/psa/admin/sbin/register_service --enable postfix # /usr/local/psa/admin/sbin/register_service --enable dovecot # /usr/local/psa/admin/sbin/register_service --enable sw-engine # /usr/local/psa/admin/sbin/register_service --enable sw-cp-server # /usr/local/psa/admin/sbin/register_service --enable psa # /usr/local/psa/admin/sbin/register_service --enable nginx # /usr/local/psa/admin/sbin/register_service --enable sw-collectd # /usr/local/psa/admin/sbin/register_service --enable spamassassin

This will restart services that, for some reason, crash. I'm not saying this will solve your problems, but at least you have an alternative for the Watchdog extension.

Ok, this may be more, but for now I'm interested in finding the problem and solving it.
I don't like creating or installing a system that automatically restarts the server as a solution, but rather as a prevention for potential future problems.

AYamshanov · Sep 29, 2023

Probably we can try to understand what is happened based on original data/messages/screenshots?

Ok, the first thing we know is the whole server is go down every morning at 5 AM, right? We know that because of alerting from 360 Monitoring. Could you please show the content of that alarm message, just to be sure?

Can you obtain/show graphs for CPU, Memory, Disk, Network activity? This way we can confirm that the server really works on that time and nothing extraordinary happens at this time. But, as an example, if we find no network activity from 5 to 6 AM, it can give us some clue, maybe.

What if an issue happens somewhere else, but not on the server? E.g. if network provider does some maintenance work on their infrastructure at the low activity hours? Can you describe where server is installed/deployed and how it connected to the Internet?

As an example, you can configure a cron task to ping Google every 10 minutes with command like

Code:

ping -D -c 4 google.com | grep '^\[' >> /tmp/ping.txt

and on the next day check that the server really has Internet connectivity during night based on timestamps on start of each line, e.g.:

# cat /tmp/ping.txt
[1695979777.005473] 64 bytes from fra15s29-in-x0e.1e100.net (2a00:1450:4001:806::200e): icmp_seq=1 ttl=120 time=1.18 ms
[1695979778.007234] 64 bytes from fra16s65-in-x0e.1e100.net (2a00:1450:4001:806::200e): icmp_seq=2 ttl=120 time=1.22 ms
[1695979779.008961] 64 bytes from fra15s29-in-x0e.1e100.net (2a00:1450:4001:806::200e): icmp_seq=3 ttl=120 time=1.15 ms
[1695979780.010679] 64 bytes from fra16s65-in-x0e.1e100.net (2a00:1450:4001:806::200e): icmp_seq=4 ttl=120 time=1.18 ms
[1695980141.535919] From 154.14.43.65 icmp_seq=1 Destination Net Unreachable
[1695980142.561591] From 154.14.43.65 icmp_seq=2 Destination Net Unreachable
[1695980143.550257] From 154.14.43.65 icmp_seq=3 Destination Net Unreachable
[1695980144.540281] From 154.14.43.65 icmp_seq=4 Destination Net Unreachable

Issue Server Down Every Morning After Domain Transfer

LuigiMdg

Basic Pleskian

pleskpanel

Regular Pleskian

LuigiMdg

Basic Pleskian

Peter Debik

Community Manager until 3/2024

LuigiMdg

Basic Pleskian

Peter Debik

Community Manager until 3/2024

LuigiMdg

Basic Pleskian

Peter Debik

Community Manager until 3/2024

LuigiMdg

Basic Pleskian

Kaspar

API expert

Maarten.

Golden Pleskian

(Plesk for Linux) Automatic Restart of Crashed Services with Systemd

Resolved - Alternative for Watchdog on Ubuntu 22.04

LuigiMdg

Basic Pleskian

(Plesk for Linux) Automatic Restart of Crashed Services with Systemd

Resolved - Alternative for Watchdog on Ubuntu 22.04

AYamshanov

Silver Pleskian

Similar threads