• Please be aware: Kaspersky Anti-Virus has been deprecated
    With the upgrade to Plesk Obsidian 18.0.64, "Kaspersky Anti-Virus for Servers" will be automatically removed from the servers it is installed on. We recommend that you migrate to Sophos Anti-Virus for Servers.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Issue kernel.panic=10 after upgrade to Obsidian 18.0.57 Update 5

Majazs

New Pleskian
Server operating system version
Ubuntu 20.04.6 LTS
Plesk version and microupdate number
Obsidian 18.0.57 Update 5
The server needed to be manually restarted twice in 2 days since the automatic upgrade to Obsidian 18.0.57 Update 5.
How can I fix it? is it possible to revert the update?
 
A downgrade is technically not possible, but it is not so likely that the Plesk update is the cause, because the latest updates were only some internal bug fixes that don't affect your operating system. What other log entries do you see in your syslog before the Kernel panic is logged?
 
About a minute befpre the crash i get:
ec 27 01:18:06 vmi... kernel: [20301.651012] INFO: task sync:108112 blocked for more than 120 seconds.
Dec 27 01:18:06 vmi... kernel: [20301.651121] Not tainted 5.4.0-105-generic #119-Ubuntu
Dec 27 01:18:06 vmi... kernel: [20301.651184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 27 01:18:06 vmi... kernel: [20301.651257] sync D 0 108112 108109 0x00000000
Dec 27 01:18:06 vmi... kernel: [20301.651260] Call Trace:
Dec 27 01:18:06 vmi... kernel: [20301.654125] __schedule+0x2e3/0x740
Dec 27 01:18:06 vmi... kernel: [20301.654128] schedule+0x42/0xb0
Dec 27 01:18:06 vmi... kernel: [20301.654207] wb_wait_for_completion+0x56/0x90
Dec 27 01:18:06 vmi... kernel: [20301.654210] ? __wake_up_pollfree+0x40/0x40
Dec 27 01:18:06 vmi... kernel: [20301.654211] sync_inodes_sb+0xd8/0x2a0
Dec 27 01:18:06 vmi... kernel: [20301.654285] sync_inodes_one_sb+0x15/0x20
Dec 27 01:18:06 vmi... kernel: [20301.654361] iterate_supers+0xa3/0x100
Dec 27 01:18:06 vmi... kernel: [20301.654362] ? default_file_splice_write+0x30/0x30
Dec 27 01:18:06 vmi... kernel: [20301.654363] ksys_sync+0x42/0xb0
Dec 27 01:18:06 vmi... kernel: [20301.654365] __ia32_sys_sync+0xe/0x20
Dec 27 01:18:06 vmi... kernel: [20301.654367] do_syscall_64+0x57/0x190
Dec 27 01:18:06 vmi... kernel: [20301.654369] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 27 01:18:06 vmi... kernel: [20301.654371] RIP: 0033:0x7f11878c644b
Dec 27 01:18:06 vmi... kernel: [20301.654377] Code: Bad RIP value.
Dec 27 01:18:06 vmi... kernel: [20301.654378] RSP: 002b:00007ffcf8c06cb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
Dec 27 01:18:06 vmi... kernel: [20301.654453] RAX: ffffffffffffffda RBX: 00007ffcf8c06df8 RCX: 00007f11878c644b
Dec 27 01:18:06 vmi... kernel: [20301.654453] RDX: 00007f11879a0501 RSI: 0000000000000000 RDI: 00007f1187966bc0
Dec 27 01:18:06 vmi... kernel: [20301.654454] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
Dec 27 01:18:06 vmi... kernel: [20301.654454] R10: 0008000000004007 R11: 0000000000000246 R12: 0000000000000000
Dec 27 01:18:06 vmi... kernel: [20301.654455] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
after that are the usual log entries + some grafana alerts:
Dec 27 01:19:33 vmi... grafana[1379]: logger=alerting.evalContext t=2023-12-27T01:19:33.004010913+01:00 level=error msg="Alert Rule Result Error" ruleId=2 name="nginx memory usage" error="request handler error: [plugin.downstreamError] failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changingstateto=keep_state
Dec 27 01:19:34 vmi... grafana[1379]: logger=alerting.evalContext t=2023-12-27T01:19:34.002768862+01:00 level=error msg="Alert Rule Result Error" ruleId=3 name="Mail server memory usage" error="request handler error: [plugin.downstreamError] failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changingstateto=keep_state

searching for 108112 doesn't produce results.
If I search for 108109 I find:
Dec 27 01:15:01 vmi... CRON[108109]: (root) CMD (sync; echo 3 > /proc/sys/vm/drop_caches)

Any further help appreciated ;)
 
I am personally not able to interpret the above log excerpt properly, but I remember a case here where a stuck cpu process caused a reboot. Since we updated the Linux Kernel to the latest version this was solved. Maybe a good first step to approach a solution could be to check if you have an up-to-date operating system with the latest Kernel.
 
Could you give some more informations about the system, like ram (free -m), cpu power, drive type (hdd, ssd, nvme)
I would remove the "sync; echo 3 > /proc/sys/vm/drop_caches" from the crons. this cron removes the cached informations about the filescache. the result is that all files that get accessed after the purge, a complete stat() command is needed to get the file informations back to the cache. this "hack" is mostly used for pressured environments to keep the memory low but it slows the system down. I do not recommend such things but let the kernel do its work and give the system the required ressources.
 
Thank you hschramm!
it is a VPS server (CPU 10 vCPU Cores, RAM 60 GB RAM, 1.6 TB SSD)
free -m:
total used free shared buff/cache available
Mem: 60287 5042 48968 997 6277 53570

I don't see the cron job specifically for drop_caches. Could it be part of /opt/psa/admin/plib/modules/advisor/scripts/update-cache.php ?

Additional info:
In Grafana I see a spike in Disk "sda" time/op before the events rising from 1-3 ms to 47ms and eventually to 2,44 s.
The spikes started a few minutes before the logged CRON event in the case of second freeze event (for the first one I didn't notice any obvious log records). For the last almost 48 hours I have not seen the problem repeat, but I am still nervous about it ;)
Could you give some more informations about the system, like ram (free -m), cpu power, drive type (hdd, ssd, nvme)
I would remove the "sync; echo 3 > /proc/sys/vm/drop_caches" from the crons. this cron removes the cached informations about the filescache. the result is that all files that get accessed after the purge, a complete stat() command is needed to get the file informations back to the cache. this "hack" is mostly used for pressured environments to keep the memory low but it slows the system down. I do not recommend such things but let the kernel do its work and give the system the required ressources.
 
Thanks for the informations. Your system has plenty of ram. no need to clear the caches. i do not know if the adivsor is clearing the caches.
the cron is run by root. there must a cronjob file elsewhere with the instructions. look closely at the /etc/crontab, /etc/cron.d/, /etc/cron.daily folders and check crontab -e -u root if there are the instructions.

if it is possible you can test if clearing the caches is causing the problems. run on the shell: sync; echo 3 > /proc/sys/vm/drop_caches and look if the server crashes or get a spike.
 
it is a VPS server (CPU 10 vCPU Cores, RAM 60 GB RAM, 1.6 TB SSD)
Additional info:
In Grafana I see a spike in Disk "sda" time/op before the events rising from 1-3 ms to 47ms and eventually to 2,44 s.
I'd let the VPS provider check the SSD.
 
Back
Top