• Please be aware: Kaspersky Anti-Virus has been deprecated
    With the upgrade to Plesk Obsidian 18.0.64, "Kaspersky Anti-Virus for Servers" will be automatically removed from the servers it is installed on. We recommend that you migrate to Sophos Anti-Virus for Servers.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Issue Server Down Every Morning After Domain Transfer

LuigiMdg

Basic Pleskian
Server operating system version
AlmaLinux 9
Plesk version and microupdate number
Plesk Obsidian v18.0.55.2
As you can guess from the title, after transferring the domain and setting the DNS as they are set on Plesk, every morning at 5:00 (3:00 UTC) the server goes down and comes back online only after 1 hour..!
The only DNS that was missing was ipv4.domain.tld, which I have now tried to add, but I highly doubt that was the problem and I expect to get the same error tomorrow morning.
I also updated Plesk but I don't think this will solve the problem either, which is why I find myself here asking for help.
I looked at the /var/log/messages file, analyzing the messages around 3:00, 4:00 and 5:00 I don't notice anything unusual other than the usual messages that appear every hour.
How can I investigate to investigate and resolve the problem?
 
Is it really DNS that is down or is it one or both of the web servers?
I didn't understand the question, I didn't say it was a DNS problem but only that it was the only operation I carried out that day.
The server is one and the domain is one.
 
Could you please describe which service is down? Is it the web server(s)? The name resolution? The whole system? Is it really down, meaning for example that when you check "systemctl status httpd" or "systemctl status nginx" are they inactive? Or is it that the services are running but you cannot access the server or websites from your location?
 
I don't know, usually at 3/5am I sleep like all ordinary mortals, I get the message from Monitoring360.
Analyzing the httpd logs does not reveal any shutdown, interruption or restart.
 
Without further analysis on the server it won't be possible to determine the cause the the interruption.
 
If you suggest what can be done, I'll do it, but I certainly can't stay awake from 3 to 5... At most I can set up a cron that executes a command line for me, I have the imagination, but if you just write to me "further analysis" doesn't help much..
 
My experience with monitoring360 is that it creates a lot of false alerts for some reason. I would not rely on it to much. Try using another monitoring service for a while too. If both monitoring services detect down-time you then you can be certain there is something wrong with your server (or dns provider).
 
Almalinux 9 has no support for the watchdog extension, so if you have services that crash, they won't get restarted.

There is a replacement for this:

I've explained in this post how it works:

But before you start implementing this, list the current services on your server:
# /usr/local/psa/admin/sbin/register_service --full-list

Next, add/change the following commands to enable the automatic restart function:
Code:
# /usr/local/psa/admin/sbin/register_service --enable plesk-task-manager
# /usr/local/psa/admin/sbin/register_service --enable httpd
# /usr/local/psa/admin/sbin/register_service --enable named-chroot
# /usr/local/psa/admin/sbin/register_service --enable fail2ban
# /usr/local/psa/admin/sbin/register_service --enable postfix
# /usr/local/psa/admin/sbin/register_service --enable dovecot
# /usr/local/psa/admin/sbin/register_service --enable sw-engine
# /usr/local/psa/admin/sbin/register_service --enable sw-cp-server
# /usr/local/psa/admin/sbin/register_service --enable psa
# /usr/local/psa/admin/sbin/register_service --enable nginx
# /usr/local/psa/admin/sbin/register_service --enable sw-collectd
# /usr/local/psa/admin/sbin/register_service --enable spamassassin

This will restart services that, for some reason, crash. I'm not saying this will solve your problems, but at least you have an alternative for the Watchdog extension.
 
My experience with monitoring360 is that it creates a lot of false alerts for some reason. I would not rely on it to much. Try using another monitoring service for a while too. If both monitoring services detect down-time you then you can be certain there is something wrong with your server (or dns provider).
I also use WhatchDog which sends me the same alert at the same times.
Almalinux 9 has no support for the watchdog extension, so if you have services that crash, they won't get restarted.

There is a replacement for this:

I've explained in this post how it works:

But before you start implementing this, list the current services on your server:
# /usr/local/psa/admin/sbin/register_service --full-list

Next, add/change the following commands to enable the automatic restart function:
Code:
# /usr/local/psa/admin/sbin/register_service --enable plesk-task-manager
# /usr/local/psa/admin/sbin/register_service --enable httpd
# /usr/local/psa/admin/sbin/register_service --enable named-chroot
# /usr/local/psa/admin/sbin/register_service --enable fail2ban
# /usr/local/psa/admin/sbin/register_service --enable postfix
# /usr/local/psa/admin/sbin/register_service --enable dovecot
# /usr/local/psa/admin/sbin/register_service --enable sw-engine
# /usr/local/psa/admin/sbin/register_service --enable sw-cp-server
# /usr/local/psa/admin/sbin/register_service --enable psa
# /usr/local/psa/admin/sbin/register_service --enable nginx
# /usr/local/psa/admin/sbin/register_service --enable sw-collectd
# /usr/local/psa/admin/sbin/register_service --enable spamassassin

This will restart services that, for some reason, crash. I'm not saying this will solve your problems, but at least you have an alternative for the Watchdog extension.
Ok, this may be more, but for now I'm interested in finding the problem and solving it.
I don't like creating or installing a system that automatically restarts the server as a solution, but rather as a prevention for potential future problems.
 
Probably we can try to understand what is happened based on original data/messages/screenshots?

Ok, the first thing we know is the whole server is go down every morning at 5 AM, right? We know that because of alerting from 360 Monitoring. Could you please show the content of that alarm message, just to be sure?

Can you obtain/show graphs for CPU, Memory, Disk, Network activity? This way we can confirm that the server really works on that time and nothing extraordinary happens at this time. But, as an example, if we find no network activity from 5 to 6 AM, it can give us some clue, maybe.

What if an issue happens somewhere else, but not on the server? E.g. if network provider does some maintenance work on their infrastructure at the low activity hours? Can you describe where server is installed/deployed and how it connected to the Internet?

As an example, you can configure a cron task to ping Google every 10 minutes with command like
Code:
ping -D -c 4 google.com | grep '^\[' >> /tmp/ping.txt
and on the next day check that the server really has Internet connectivity during night based on timestamps on start of each line, e.g.:
# cat /tmp/ping.txt
[1695979777.005473] 64 bytes from fra15s29-in-x0e.1e100.net (2a00:1450:4001:806::200e): icmp_seq=1 ttl=120 time=1.18 ms
[1695979778.007234] 64 bytes from fra16s65-in-x0e.1e100.net (2a00:1450:4001:806::200e): icmp_seq=2 ttl=120 time=1.22 ms
[1695979779.008961] 64 bytes from fra15s29-in-x0e.1e100.net (2a00:1450:4001:806::200e): icmp_seq=3 ttl=120 time=1.15 ms
[1695979780.010679] 64 bytes from fra16s65-in-x0e.1e100.net (2a00:1450:4001:806::200e): icmp_seq=4 ttl=120 time=1.18 ms
[1695980141.535919] From 154.14.43.65 icmp_seq=1 Destination Net Unreachable
[1695980142.561591] From 154.14.43.65 icmp_seq=2 Destination Net Unreachable
[1695980143.550257] From 154.14.43.65 icmp_seq=3 Destination Net Unreachable
[1695980144.540281] From 154.14.43.65 icmp_seq=4 Destination Net Unreachable
 
Back
Top