- Server operating system version
- Ubuntu 18.04.6 LTS
- Plesk version and microupdate number
- Obsidian Version 18.0.46 Update #1
We have an issue with Acronis backup which has either a memory leak chewing all available RAM then crashing the box or a kernel panic issue where the kernel panic chews all available RAM and crashes the box
We have an open ticket with Plesk who've bumped it to Acronis who have ridiculously bad at helping here and have provided zero tangible support which is ridiculous for a backup product
Posting here as really out of options with where to take this next
The issue is happening intermittently every 1-2 weeks whereby during a backup something goes wrong, RAM usage spikes rapidly and the box crashes within 90-120 seconds as it locks up. We have 100+ sites on this server so not ideal at all. All we can do when it crashes is hard boot the box
Looking at our atop logs, the box chews through ~20gb+ of RAM in 90 seconds or so
It looks like the snap_api module is having some sort of issue. The box has 64gb of RAM with approx 30gb free. We're running backups every 2 hours and we can go days without an issue and backups run normally and then all of a sudden the issue happens randomly
The issue is happening at the kernel level not at the user level. Its not a load issue or a memory issue and the box has several hundred gb free. The backup is ~400gb or so
The box is running Wordpress sites only, Ubuntu 18 and latest Plesk and Acronis versions
We've tried
-changing swappiness down to 20 -> initially we'd thought the swappiness was too high at 50 and causing disk thrashing
--changing swappiness to 5 seems to make the problem worse
-adding a backup precommand to flush cached memory before the backup->initially we'd thought the box wasn't letting go of cached memory and thus not allowing Acronis agent enough RAM. This has reduced the frequency of the issue from 1-5 days to 1-2 weeks
-removing Acronis backup exclusions - this seems to be the solution to half their problems which is ridiculous in itself. How having exclusions in a backup is a problem is beyond me. This made no difference
-doing a full kernel update -> no change
-logging everything-> theres nothing in the logs aside from the snap_api model showing page allocation errors relating to memory which is because the box is running out of RAM
-running atop with 10 second logging -> makes it clear the issue is nothing to do with whats running on the box, load is nominal and doesn't change significantly before all RAM gets used
-uninstalling/reinstalling acronis agent
-fiddling with Cron schedules to even load out as much as possible so big operations aren't overlapping
Anyone have any ideas?
We have an open ticket with Plesk who've bumped it to Acronis who have ridiculously bad at helping here and have provided zero tangible support which is ridiculous for a backup product
Posting here as really out of options with where to take this next
The issue is happening intermittently every 1-2 weeks whereby during a backup something goes wrong, RAM usage spikes rapidly and the box crashes within 90-120 seconds as it locks up. We have 100+ sites on this server so not ideal at all. All we can do when it crashes is hard boot the box
Looking at our atop logs, the box chews through ~20gb+ of RAM in 90 seconds or so
It looks like the snap_api module is having some sort of issue. The box has 64gb of RAM with approx 30gb free. We're running backups every 2 hours and we can go days without an issue and backups run normally and then all of a sudden the issue happens randomly
The issue is happening at the kernel level not at the user level. Its not a load issue or a memory issue and the box has several hundred gb free. The backup is ~400gb or so
The box is running Wordpress sites only, Ubuntu 18 and latest Plesk and Acronis versions
We've tried
-changing swappiness down to 20 -> initially we'd thought the swappiness was too high at 50 and causing disk thrashing
--changing swappiness to 5 seems to make the problem worse
-adding a backup precommand to flush cached memory before the backup->initially we'd thought the box wasn't letting go of cached memory and thus not allowing Acronis agent enough RAM. This has reduced the frequency of the issue from 1-5 days to 1-2 weeks
-removing Acronis backup exclusions - this seems to be the solution to half their problems which is ridiculous in itself. How having exclusions in a backup is a problem is beyond me. This made no difference
-doing a full kernel update -> no change
-logging everything-> theres nothing in the logs aside from the snap_api model showing page allocation errors relating to memory which is because the box is running out of RAM
-running atop with 10 second logging -> makes it clear the issue is nothing to do with whats running on the box, load is nominal and doesn't change significantly before all RAM gets used
-uninstalling/reinstalling acronis agent
-fiddling with Cron schedules to even load out as much as possible so big operations aren't overlapping
Anyone have any ideas?