• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue Pleask/Linux Server Goes Down & Restarts a Few Times Weekly on Sunday Morning

bradz

Regular Pleskian
I have one server that goes down weekly on Sunday early morning. This has been going on for months.
CentOS Linux 7.9.2009 (Core)
Plesk Obsidian: Version 18.0.39 Update #2
This is outside my expertise, but would like to learn how to diagnosis.
I see this in the message log file var/log
It repeats a few times.
What does this indicated? Any help greatly appreciated. The server seems fine on other days. Is it hacker attempt related?
Sever backup is set to run sunday at 4, this seems to happen before that time.
Thanks, Brad

MESSAGES
Nov 28 01:54:20 serverdomainname grafana-server: t=2021-11-28T01:54:20-0500 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=10 name="Partition \"/home\" utilization" error="request handler response error {Post \"http://127.0.0.1:8880/modules/monit...YVtWZEy6Vbn45HoGoOm9o0s6Z3RLhqs0Af6eA3/query\": dial tcp 127.0.0.1:8880: connect: connection refused A <nil> [] [] 0xc001212450}" changing state to=keep_state
Nov 28 01:54:21 serverdomainname wdcollect: Connection to server has been established.
Nov 28 01:54:21 serverdomainname wdcollect[14784]: Connection to server has been established.
Nov 28 01:54:21 serverdomainname monit: 'plesk_apache' start: /bin/systemctl
Nov 28 01:54:21 serverdomainname systemd: Starting Startup script for Plesk control panel server...
Nov 28 01:54:21 serverdomainname systemd: Started Startup script for Plesk control panel server.
Nov 28 01:55:01 serverdomainname systemd: Started Session 16244 of user root.
Nov 28 01:55:30 serverdomainname wdcollect: Connection to SMTP server has been closed.
Nov 28 01:55:30 serverdomainname wdcollect[14784]: Connection to SMTP server has been closed.
Nov 28 01:58:54 serverdomainname xinetd[1353]: START: ftp pid=2222 from=::ffff:94.232.41.27
Nov 28 01:58:54 serverdomainname xinetd[2222]: warning: can't get client address: Connection reset by peer
Nov 28 01:58:54 serverdomainname xinetd[1353]: EXIT: ftp status=1 pid=2222 duration=0(sec)
Nov 28 01:58:55 serverdomainname xinetd[1353]: START: ftp pid=2223 from=::ffff:94.232.41.27
Nov 28 01:58:55 serverdomainname xinetd[1353]: EXIT: ftp status=0 pid=2223 duration=0(sec)
Nov 28 01:58:55 serverdomainname xinetd[1353]: START: ftp pid=2224 from=::ffff:94.232.41.27
Nov 28 01:58:55 serverdomainname xinetd[1353]: EXIT: ftp status=0 pid=2224 duration=0(sec)
Nov 28 01:59:01 serverdomainname systemd: Started Session 16246 of user root.
Nov 28 01:59:01 serverdomainname systemd: Started Session 16245 of user psaadm.
Nov 28 01:59:23 serverdomainname monit: 'plesk_apache' connection passed
Nov 28 01:59:23 serverdomainname wdcollect: Connection to server has been established.
Nov 28 01:59:23 serverdomainname wdcollect[14784]: Connection to server has been established.
Nov 28 02:00:01 serverdomainname systemd: Started Session 16248 of user psaadm.
Nov 28 02:00:01 serverdomainname systemd: Started Session 16247 of user psaadm.
Nov 28 02:00:01 serverdomainname systemd: Started Session 16249 of user psaadm.
Nov 28 02:00:32 serverdomainname wdcollect: Connection to SMTP server has been closed.
Nov 28 02:00:51 serverdomainname wdcollect[14784]: Connection to SMTP server has been closed.
Nov 28 02:01:01 serverdomainname systemd: Started Session 16250 of user root.
Nov 28 02:04:25 serverdomainname grafana-server: t=2021-11-28T02:04:25-0500 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=11 name="Real memory usage" error="request handler error: failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changing state to=keep_state
Nov 28 02:04:31 serverdomainname grafana-server: t=2021-11-28T02:04:31-0500 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=1 name="Apache & PHP-FPM memory usage" error="request handler error: failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changing state to=keep_state
Nov 28 02:04:32 serverdomainname grafana-server: t=2021-11-28T02:04:32-0500 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=2 name="nginx memory usage" error="request handler error: failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changing state to=keep_state
Nov 28 02:04:35 serverdomainname grafana-server: t=2021-11-28T02:04:35-0500 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=4 name="MySQL memory usage" error="request handler error: failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changing state to=keep_state
Nov 28 02:04:35 serverdomainname grafana-server: t=2021-11-28T02:04:35-0500 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=3 name="Mail server memory usage" error="request handler error: failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changing state to=keep_state
Nov 28 02:04:40 serverdomainname grafana-server: t=2021-11-28T02:04:40-0500 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=5 name="Plesk memory usage" error="request handler error: failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changing state to=keep_state
Nov 28 02:04:41 serverdomainname grafana-server: t=2021-11-28T02:04:41-0500 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=8 name="Partition \"/tmp\" utilization" error="request handler error: failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changing state to=keep_state
Nov 28 02:04:42 serverdomainname grafana-server: t=2021-11-28T02:04:42-0500 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=6 name="Partition \"/\" utilization" error="request handler error: failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changing state to=keep_state
Nov 28 02:04:45 serverdomainname grafana-server: t=2021-11-28T02:04:45-0500 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=7 name="Partition \"/usr\" utilization" error="request handler error: failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changing state to=keep_state
Nov 28 02:04:46 serverdomainname monit: monit: embed_ssl_socket(): Openssl read timeout error!
Nov 28 02:04:46 serverdomainname monit: 'plesk_apache' failed, cannot open a connection to INET[localhost:8443]
Nov 28 02:04:46 serverdomainname systemd: Stopping Startup script for Plesk control panel server...
 
I see the following in the error log

[Sat Nov 27 04:00:17.771627 2021] [mpm_event:notice] [pid 14567:tid 139642368698496] AH00493: SIGUSR1 received. Doing graceful restart
[Sat Nov 27 04:00:21.952391 2021] [lbmethod_heartbeat:notice] [pid 14567:tid 139642368698496] AH02282: No slotmem from mod_heartmonitor
[Sat Nov 27 04:00:22.022989 2021] [ssl:warn] [pid 14567:tid 139642368698496] AH01909: RSA certificate configured for webmail.______.com:443 does NOT include an ID which matches the server name
[Sat Nov 27 04:00:22.029078 2021] [ssl:warn] [pid 14567:tid 139642368698496] AH01909: RSA certificate configured for webmail.______.com:443 does NOT include an ID which matches the server name
[Sat Nov 27 04:00:22.035131 2021] [ssl:warn] [pid 14567:tid 139642368698496] AH01909: RSA certificate configured for default-2607_f1c0_822_3b00__84_2108:443 does NOT include an ID which matches the server name
[Sat Nov 27 04:00:22.035701 2021] [ssl:warn] [pid 14567:tid 139642368698496] AH01909: RSA certificate configured for default-______:443 does NOT include an ID which matches the server name
[Sat Nov 27 04:00:22.035922 2021] [ssl:warn] [pid 14567:tid 139642368698496] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Sat Nov 27 04:00:22.038284 2021] [mpm_event:notice] [pid 14567:tid 139642368698496] AH00489: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips Apache mod_fcgid/2.3.9 configured -- resuming normal operations
[Sat Nov 27 04:00:22.038311 2021] [core:notice] [pid 14567:tid 139642368698496] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Sat Nov 27 13:33:45.310654 2021] [:error] [pid 12735:tid 139642028259072] [client _____:54010] [client ____] ModSecurity: [file "/etc/httpd/conf/modsecurity.d/rules/tortix/modsec/50_plesk_basic_asl_rules.conf"] [line "39"] [id "33340006"] [rev "68"] [msg "Protected by Atomicorp.com Basic Non-Realtime WAF Rules: Generic Path Recursion denied in URI/ARGS"] [data "../../,ARGS:lang"] [severity "CRITICAL"] Access denied with code 403 (phase 2). Pattern match "\\\\.\\\\./\\\\.\\\\./" at ARGS:lang. [hostname "_____"] [uri "/remote/fgt_lang"] [unique_id "YaJ6CXnRCZPJvBIw6AFylwAAAEQ"]
[Sun Nov 28 02:07:53.991735 2021] [fcgid:warn] [pid 27675:tid 139641910970112] [client ____:45042] mod_fcgid: read data timeout in 45 seconds, referer: https://webmail.____.com/imp/dynamic.php?page=mailbox
[Sun Nov 28 02:08:17.565106 2021] [core:error] [pid 27675:tid 139641910970112] [client ____:45042] End of script output before headers: ajax.php, referer: https://webmail._____.com/imp/dynamic.php?page=mailbox
[Sun Nov 28 02:08:25.728627 2021] [fcgid:warn] [pid 12731:tid 139642368698496] mod_fcgid: process 3775 graceful kill fail, sending SIGKILL
...
 
It is not a virtual server. I do not know what happens. Which log should I look at?
I do have 3 other server set up the same and only this one does it. I keep wondering if it is something to do with a site on the server.
I do code all the sites.
I am also thinking I should change the backup day to see if that is a factor.
Your thoughts?
Thanks Peter, Brad
 
It can be a lot of things. I could be something that is happening during the nightly Plesk maintenance. That would round about have the right timing. But you'd need to go through all log files to check what is happening at the given time, meaning the log files in /var/log, e.g. messages, but also the ones in the directories that are decending from /var/logs. Maybe you find some hints there.

The timecodes in your Apache excerpt and the messages excerpt do not match. Your apache excerpt is from the time when your backup is done. If your backup consumes a lot of cpu resources, it is possible that web server operations time out, at least for a short period of time. Try to turn compression off totally and see if the issue still occurs. Backup without compression saves a huge amount of cpu time, leaving more for other applications. If that is the cause, gradually increase compression from "none" to "fastest", then "fast", then "normal" etc. (Version >=18.0.40 needed) until you achieve the desired result that the web server keeps running when the system does the backup.
 
It does make sense that it is related to the backup Peter. The other 3 servers set up about the same are fine and this server gets the highest weekend traffic due to area sports reports it hosts. I will also switch the backup to lower traffic days like monday or tuesday night. Sincerely Thanks! Brad
 
Back
Top