• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Resolved Automated backup stopped on two servers since 18.0.41, recurring

Bitpalast

Plesk addicted!
Plesk Guru
On two of our servers, which are all configured identical, the automated Plesk backup has stopped since the update to 18.0.41.

After an attempt to start it at the scheduled time, a PMM process stays in the process list, but it does not do anything. Neither is a log created that would give some details. However, when starting a manual backup, the manual backup is done and finishes without errors. Only the automated backups don't run, and only on these two machines while on all other machines that have an identical setup and operating system things work fine. There is enough space, ram and cpu power on both and there is enough FTP storage space as well. I've tried a few things like saving the backup plan and backup configurations again, but it didn't help, at night the backups just don't want to start.

Has anyone else the same issue? Any ideas?
 
As always working on my own question.

I found that on the affected servers several processed related to backups that were running while the upgrade to version 18.0.41 was done remained in the process list. Example:

Code:
root      2749  0.0  0.0 113280  1204 ?        Ss   Jan22   0:00 /bin/sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1
psaadm    2757  0.0  0.0  58860  3848 ?        S    Jan22   0:00 /usr/local/psa/admin/sbin/backupmng
psaadm    2770  0.0  0.0  58992  1112 ?        S    Jan22   0:07 /usr/local/psa/admin/sbin/backupmng
psaadm    2771  0.0  0.0 533072 159752 ?       SN   Jan22   0:18 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/plib/backup/scheduled_backup.php --dump 31
root      2924  0.0  0.0 412780 66932 ?        SN   Jan22   0:01 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/sbin/backup_agent --dump -domains-id -owner-guid cd87a31c-3316-4faf-be11-29f9287f60da -owner-type client -description-file /usr/local/psa/PMM/sessions/2022-01-22-123507.233/dump_description -compression-level normal -session-path /usr/local/psa/PMM/sessions/2022-01-22-123507.233 -output-file ftp://redacted//Postfaecher/ -ftp-passive-mode -from-file /usr/local/psa/PMM/sessions/2022-01-22-123507.233/from-file -ftp
root      3268  0.0  0.0 215736  5360 ?        SN   Jan22   0:00 /usr/local/psa/admin/sbin/backup-archiver --pack --source=/var/www/vhosts/system/redacted/conf --destination=clients/redacted/backup_conf_2201221235.tzst --session-path=/usr/local/psa/PMM/sessions/2022-01-22-123507.233 --warnings=/tmp/bwsZNSIZs --compression-method=zstd --compression-level=normal --exclude-files=/tmp/befpRvIYJ

Obviously the upgrade process to 18.0.41 has interfered with ongoing backups and caused them to hang. I have now removed these processes. We'll see if it lets the scheduled backups run again.
 
Did you find a solution?
I have the issue that since my update to version 18.0.41 the backup is not working and after starting the backup it stops / crashes at 38%.
My Update is configured to google drive (enough space is available). After the backub starts / stops at 38% i cant login to any pages of my server so i guess db crashes.
I need to restart server then.
I already stoped the backup and tried to start manual - same result.
 
@YourShopPartner Your issue is a different one. Mine was that the upgrade process interrupted a running backup and left some processes in the process list. I killed these processes and the automated backup started again. I'll set this thread to "resolved" now. Please post your new, different issue into a new thread on the forum.
 
Use the Backup Telemetry extension to get more info about a running backup. Also the 'strace -p <process number>' can help to understand the reason for the process hanging.
 
While the extension itself is surely beneficial, in this case the issue is that the backup does not start, hence there is no data to analyze. I think it is either caused by a backup upgrade process or by another backup, e.g. of a subscription, that finishes but then does not close the processes. We'll try to collect more details on this when it occurs again.
 
@DenisG new input:

On one server we are seeing the exact same issue again after we had removed the hanging processes. It seems that an account, on which the automated backup is disabled since May 2021 is starting automated backups anyway. For that it tries to connect to an FTP account on a server that is neither configured in the Plesk backup configuration, nor accessible. Not sure yet whether the problem is caused by Plesk running a disabled backup job or by the inaccessible FTP storage space.

So obviously Plesk is trying to run backup jobs that have been disabled. This happens since the upgrade to 18.0.41.

This is how it looks in the session migration log: See attached log sample file.

The process list shows this:
Code:
root     25294  0.0  0.0 113280  1208 ?        Ss   09:46   0:00 /bin/sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1
psaadm   25298  0.0  0.0  58860  3852 ?        S    09:46   0:00 /usr/local/psa/admin/sbin/backupmng
psaadm   25299  0.0  0.0  58860  1120 ?        S    09:46   0:00 /usr/local/psa/admin/sbin/backupmng
psaadm   25300  0.0  0.0 401020 52980 ?        SN   09:46   0:00 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/plib/backup/scheduled_backup.php --dump 2
root     25408  0.0  0.0 408576 62344 ?        SN   09:46   0:00 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/sbin/backup_agent --dump -clients-id -owner-guid 55c2b0a2-f42e-4a06-a233-f8ffad75d75c -owner-type client -description-file /usr/local/psa/PMM/sessions/2022-02-02-094602.507/dump_description -compression-level normal -session-path /usr/local/psa/PMM/sessions/2022-02-02-094602.507 -output-file ftp://[email protected]//backups/ -ftp-passive-mode -from-file /usr/local/psa/PMM/sessions/2022-02-02-094602.507/from-file -ftp
root     25692  0.0  0.0 216024  5900 ?        SN   09:46   0:00 /usr/local/psa/admin/sbin/backup-archiver --pack-incrementally --source=/var/qmail/mailnames/test.tld --destination=clients/muster01/domains/test.tld/backup_domainmail_2202020946.tzst --listing-file=/tmp_dump/bmmVRmhab --created-index-file=/tmp_dump/bifPqb83A --incremental-dependencies=/tmp_dump/bdfv0Fi1i --user=popuser --no-recursion --session-path=/usr/local/psa/PMM/sessions/2022-02-02-094602.507 --warnings=/tmp_dump/bwsJEFtY0 --compression-method=zstd --compression-level=normal --id=1978

The processes are in wait states, for example process 25299:
Code:
strace: Process 25299 attached
select(8, [5 7], NULL, NULL, {tv_sec=0, tv_usec=448418}) = 0 (Timeout)
wait4(25300, 0x7fffec745a3c, WNOHANG, NULL) = 0
select(8, [5 7], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
wait4(25300, 0x7fffec745a3c, WNOHANG, NULL) = 0
select(8, [5 7], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
wait4(25300, 0x7fffec745a3c, WNOHANG, NULL) = 0
select(8, [5 7], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
wait4(25300, 0x7fffec745a3c, WNOHANG, NULL) = 0
select(8, [5 7], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
wait4(25300, 0x7fffec745a3c, WNOHANG, NULL) = 0
select(8, [5 7], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
wait4(25300, 0x7fffec745a3c, WNOHANG, NULL) = 0
.
.
.

In the back office of the subscription, no indication is given that a backup is still running.
 

Attachments

  • log_sample.pdf
    106.8 KB · Views: 2
On a different server where we have the same issue with hanging backups the account/domain on which the backup job is run is deactivated. So obviously a very similar situation: deactivated domain like in the other case where the backup job was disabled, yet Plesk tried to run it. However, on that account the backup is displayed as "running" but has not been successful during the past few dozen attempts.

See migration_log_sample2.pdf for the migration.log of that account. The other data is basically the same, e.g. zero completed domains in dump-status.xml and the same hanging processes in the Linux process list.
 

Attachments

  • migration_log_sample_2.pdf
    94.7 KB · Views: 1
Check whether scheduled backups are active in the 'BackupsScheduled' table of the 'psa' database ('active' column).

It looks like inaccessible FTP storage causes hanging backup processes.

Also pay attention the 'muster01' customer (not the 'test.tld' subscription) is backed up in the first case.
 
I have reproduced the issue with hanging of backup processes. It is caused by inaccessible FTP storage. Also hung processes can block run of other scheduled backups.
 
Great news. Thank you for testing! Backups are sooooo super important to us. We're like crazy for security and backups.
So will this be considered to be a "bug" and fixed in an upcoming release?
 
Thanks a lot @Peter Debik for solution, we have the case on 5 servers (15 without recient backup OMG!)... Plesk team & @DenisG , it's not admissible this bug type! moreover when you justify 35% of increase price each year!
 
It is indeed a very dangerous bug. I am pretty sure that thousands of Plesk users won't notice that their backups are not running, because this bug is not notifying anyone anywhere. We are currently verifying server backups on each server manually daily, because otherwise we would not know about failed backups.
 
Let's hope a patch will be available soon or at least before the release of 18.0.42. Right now this critical bug prevents us from updating our servers.
 
Let's hope a patch will be available soon or at least before the release of 18.0.42. Right now this critical bug prevents us from updating our servers.
I can't understand the time needed by plesk team to solve an soo critical bug. We had to review every day our affected servers as we can't block shedule backup to our client.
 
We had to review every day our affected servers as we can't block shedule backup to our client.
Why not? As far as I know this bug occurs when a subscription backup has been setup with remote storage and the connection fails to the remote storage. So why can't you temporary disable/remove backups for these clients/subscriptions? Their backups aren't working anyway? Leaving the 'bad' backups enabled impacts your own backups and presumably those of you other clients.

Or am I missing something?
 
Back
Top