Issue kernel.panic=10 after upgrade to Obsidian 18.0.57 Update 5

Majazs · Dec 27, 2023

The server needed to be manually restarted twice in 2 days since the automatic upgrade to Obsidian 18.0.57 Update 5.
How can I fix it? is it possible to revert the update?

Peter Debik · Dec 27, 2023

A downgrade is technically not possible, but it is not so likely that the Plesk update is the cause, because the latest updates were only some internal bug fixes that don't affect your operating system. What other log entries do you see in your syslog before the Kernel panic is logged?

Majazs · Dec 27, 2023

About a minute befpre the crash i get:
ec 27 01:18:06 vmi... kernel: [20301.651012] INFO: task sync:108112 blocked for more than 120 seconds.
Dec 27 01:18:06 vmi... kernel: [20301.651121] Not tainted 5.4.0-105-generic #119-Ubuntu
Dec 27 01:18:06 vmi... kernel: [20301.651184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 27 01:18:06 vmi... kernel: [20301.651257] sync D 0 108112 108109 0x00000000
Dec 27 01:18:06 vmi... kernel: [20301.651260] Call Trace:
Dec 27 01:18:06 vmi... kernel: [20301.654125] __schedule+0x2e3/0x740
Dec 27 01:18:06 vmi... kernel: [20301.654128] schedule+0x42/0xb0
Dec 27 01:18:06 vmi... kernel: [20301.654207] wb_wait_for_completion+0x56/0x90
Dec 27 01:18:06 vmi... kernel: [20301.654210] ? __wake_up_pollfree+0x40/0x40
Dec 27 01:18:06 vmi... kernel: [20301.654211] sync_inodes_sb+0xd8/0x2a0
Dec 27 01:18:06 vmi... kernel: [20301.654285] sync_inodes_one_sb+0x15/0x20
Dec 27 01:18:06 vmi... kernel: [20301.654361] iterate_supers+0xa3/0x100
Dec 27 01:18:06 vmi... kernel: [20301.654362] ? default_file_splice_write+0x30/0x30
Dec 27 01:18:06 vmi... kernel: [20301.654363] ksys_sync+0x42/0xb0
Dec 27 01:18:06 vmi... kernel: [20301.654365] __ia32_sys_sync+0xe/0x20
Dec 27 01:18:06 vmi... kernel: [20301.654367] do_syscall_64+0x57/0x190
Dec 27 01:18:06 vmi... kernel: [20301.654369] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 27 01:18:06 vmi... kernel: [20301.654371] RIP: 0033:0x7f11878c644b
Dec 27 01:18:06 vmi... kernel: [20301.654377] Code: Bad RIP value.
Dec 27 01:18:06 vmi... kernel: [20301.654378] RSP: 002b:00007ffcf8c06cb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
Dec 27 01:18:06 vmi... kernel: [20301.654453] RAX: ffffffffffffffda RBX: 00007ffcf8c06df8 RCX: 00007f11878c644b
Dec 27 01:18:06 vmi... kernel: [20301.654453] RDX: 00007f11879a0501 RSI: 0000000000000000 RDI: 00007f1187966bc0
Dec 27 01:18:06 vmi... kernel: [20301.654454] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
Dec 27 01:18:06 vmi... kernel: [20301.654454] R10: 0008000000004007 R11: 0000000000000246 R12: 0000000000000000
Dec 27 01:18:06 vmi... kernel: [20301.654455] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
after that are the usual log entries + some grafana alerts:
Dec 27 01:19:33 vmi... grafana[1379]: logger=alerting.evalContext t=2023-12-27T01:19:33.004010913+01:00 level=error msg="Alert Rule Result Error" ruleId=2 name="nginx memory usage" error="request handler error: [plugin.downstreamError] failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changingstateto=keep_state
Dec 27 01:19:34 vmi... grafana[1379]: logger=alerting.evalContext t=2023-12-27T01:19:34.002768862+01:00 level=error msg="Alert Rule Result Error" ruleId=3 name="Mail server memory usage" error="request handler error: [plugin.downstreamError] failed to query data: Failed to query data: rpc error: code = DeadlineExceeded desc = context deadline exceeded" changingstateto=keep_state

searching for 108112 doesn't produce results.
If I search for 108109 I find:
Dec 27 01:15:01 vmi... CRON[108109]: (root) CMD (sync; echo 3 > /proc/sys/vm/drop_caches)

Any further help appreciated

Peter Debik · Dec 27, 2023

I am personally not able to interpret the above log excerpt properly, but I remember a case here where a stuck cpu process caused a reboot. Since we updated the Linux Kernel to the latest version this was solved. Maybe a good first step to approach a solution could be to check if you have an up-to-date operating system with the latest Kernel.

hschramm · Dec 28, 2023

Could you give some more informations about the system, like ram (free -m), cpu power, drive type (hdd, ssd, nvme)
I would remove the "sync; echo 3 > /proc/sys/vm/drop_caches" from the crons. this cron removes the cached informations about the filescache. the result is that all files that get accessed after the purge, a complete stat() command is needed to get the file informations back to the cache. this "hack" is mostly used for pressured environments to keep the memory low but it slows the system down. I do not recommend such things but let the kernel do its work and give the system the required ressources.

Majazs · Dec 28, 2023

Thank you hschramm!
it is a VPS server (CPU 10 vCPU Cores, RAM 60 GB RAM, 1.6 TB SSD)
free -m:
total used free shared buff/cache available
Mem: 60287 5042 48968 997 6277 53570

I don't see the cron job specifically for drop_caches. Could it be part of /opt/psa/admin/plib/modules/advisor/scripts/update-cache.php ?

Additional info:
In Grafana I see a spike in Disk "sda" time/op before the events rising from 1-3 ms to 47ms and eventually to 2,44 s.
The spikes started a few minutes before the logged CRON event in the case of second freeze event (for the first one I didn't notice any obvious log records). For the last almost 48 hours I have not seen the problem repeat, but I am still nervous about it

hschramm said:
Could you give some more informations about the system, like ram (free -m), cpu power, drive type (hdd, ssd, nvme)
I would remove the "sync; echo 3 > /proc/sys/vm/drop_caches" from the crons. this cron removes the cached informations about the filescache. the result is that all files that get accessed after the purge, a complete stat() command is needed to get the file informations back to the cache. this "hack" is mostly used for pressured environments to keep the memory low but it slows the system down. I do not recommend such things but let the kernel do its work and give the system the required ressources.

hschramm · Dec 28, 2023

Thanks for the informations. Your system has plenty of ram. no need to clear the caches. i do not know if the adivsor is clearing the caches.
the cron is run by root. there must a cronjob file elsewhere with the instructions. look closely at the /etc/crontab, /etc/cron.d/, /etc/cron.daily folders and check crontab -e -u root if there are the instructions.

if it is possible you can test if clearing the caches is causing the problems. run on the shell: sync; echo 3 > /proc/sys/vm/drop_caches and look if the server crashes or get a spike.

mow · Jan 3, 2024

Majazs said:
it is a VPS server (CPU 10 vCPU Cores, RAM 60 GB RAM, 1.6 TB SSD)
Additional info:
In Grafana I see a spike in Disk "sda" time/op before the events rising from 1-3 ms to 47ms and eventually to 2,44 s.

I'd let the VPS provider check the SSD.

Issue kernel.panic=10 after upgrade to Obsidian 18.0.57 Update 5

Majazs

New Pleskian

Peter Debik

Community Manager until 3/2024

Majazs

New Pleskian

Peter Debik

Community Manager until 3/2024

hschramm

Basic Pleskian

Majazs

New Pleskian

hschramm

Basic Pleskian

mow

Silver Pleskian

Similar threads