Forwarded to devs Recurring, intermittent backup process stuck since update to 18.0.41 on three independent systems

Bitpalast · Feb 1, 2022

Username:

TITLE

Recurring, intermitten backup process stuck since update to 18.0.41 on three independent systems

PRODUCT, VERSION, OPERATING SYSTEM, ARCHITECTURE

Obsidian 18.0.41 latest MU
CentOS 7.9

PROBLEM DESCRIPTION

As described in Issue - Automated backup stopped on two servers since 18.0.41, recurring

The automated backup process is not starting or running for unknown reason. No specific log entries. When looking into the Linux process list, some processes with "backup" in their name reside there, but don't seem to be doing anything.

Initially the issue occured on two machines when the upgrade was done to 18.0.41, but after we have removed the hanging processes, it has re-occured on these two machines and now a third one where now upgrade process has been done while the backup was running.

STEPS TO REPRODUCE

Cannot be reproduced manually. It occured on a small number of machines, not all, but we don't know how it could be "manually" reproduced, because manual backup processes are running while automated are not.

ACTUAL RESULT

As described above and in Issue - Automated backup stopped on two servers since 18.0.41, recurring

EXPECTED RESULT

Automated backup processes run as they are scheduled. No hanging "backup" processes in the process list.

ANY ADDITIONAL INFORMATION

(DID NOT ANSWER QUESTION)

YOUR EXPECTATIONS FROM PLESK SERVICE TEAM

Confirm bug

DenisG · Feb 3, 2022

The bug is confirmed and the request PPPM-13411 is created.

ArnauA · Feb 15, 2022

Good morning.
The same thing is happening to us. Backups fail without showing any error or log.
What workaround can we use until the update is pushed out by Plesk that resolves the PPPM-13411 error?

Bitpalast · Feb 15, 2022

1) Login with SSH.
2) run
# ps aux | grep pmm
and
# ps aux | grep backup
and copy the output to an editor for future reference.
3) Search the output for the domain/subscription name that the current hanging backup process is processing.
Example:

Code:

root      2749  0.0  0.0 113280  1204 ?        Ss   Jan22   0:00 /bin/sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1
psaadm    2757  0.0  0.0  58860  3848 ?        S    Jan22   0:00 /usr/local/psa/admin/sbin/backupmng
psaadm    2770  0.0  0.0  58992  1112 ?        S    Jan22   0:07 /usr/local/psa/admin/sbin/backupmng
psaadm    2771  0.0  0.0 533072 159752 ?       SN   Jan22   0:18 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/plib/backup/scheduled_backup.php --dump 31
root      2924  0.0  0.0 412780 66932 ?        SN   Jan22   0:01 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/sbin/backup_agent --dump -domains-id -owner-guid cd87a31c-3316-4faf-be11-29f9287f60da -owner-type client -description-file /usr/local/psa/PMM/sessions/2022-01-22-123507.233/dump_description -compression-level normal -session-path /usr/local/psa/PMM/sessions/2022-01-22-123507.233 -output-file ftp://redacted//Postfaecher/ -ftp-passive-mode -from-file /usr/local/psa/PMM/sessions/2022-01-22-123507.233/from-file -ftp
root      3268  0.0  0.0 215736  5360 ?        SN   Jan22   0:00 /usr/local/psa/admin/sbin/backup-archiver --pack --source=/var/www/vhosts/system/redacted/conf --destination=clients/redacted/backup_conf_2201221235.tzst --session-path=/usr/local/psa/PMM/sessions/2022-01-22-123507.233 --warnings=/tmp/bwsZNSIZs --compression-method=zstd --compression-level=normal --exclude-files=/tmp/befpRvIYJ

In this example you can see

Code:

--source=/var/www/vhosts/system/redacted/conf

This is your subscription (the "redacted" part) that has a faulty backup and is causing the following backups to wait.
4) Login to Plesk GUI and open the backup manager of that subscription. Very likely you'll see error reports there for failed FTP logins. Whether or not does not matter. You want to disable this backup. So enter the backup schedule settings and remove the "active" checkbox checkmark to disable this backup. It does not work anyway, because it cannot login to its configured FTP repository. Save the disabled backup settings.
5) Back on the Linux console kill the pmm processes and the backupmng processes, e.g.
# kill -p <process id>
Example:
# kill -p 2749
# kill -p 2757
# kill -p 2770
# kill -p 2771
# kill -p 2924
# kill -p 3268
while the last two will probably already be gone after you killed the first few. The process IDs in this example are the ones taken from the example above. In your own world you need to use the correct own process IDs as they are output from the ps commands.
6) After killing these, verify that either no further pmm and backupmng processes remain in the process list or that new pmm and backupmng processes have started for backups in the queue.

sebgonzes · Feb 16, 2022

Thanks a lot @Peter Debik for solution, we have the case on 5 servers (15 without recient backup OMG!)... Plesk team, it's not admissible this bug type! moreover when you justify 35% of increase price each year!

ArnauA · Feb 16, 2022

Peter Debik said:
1) Login with SSH.
2) run
# ps aux | grep pmm
and
# ps aux | grep backup
and copy the output to an editor for future reference.
3) Search the output for the domain/subscription name that the current hanging backup process is processing.
Example:

Code:

root 2749 0.0 0.0 113280 1204 ? Ss Jan22 0:00 /bin/sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1 psaadm 2757 0.0 0.0 58860 3848 ? S Jan22 0:00 /usr/local/psa/admin/sbin/backupmng psaadm 2770 0.0 0.0 58992 1112 ? S Jan22 0:07 /usr/local/psa/admin/sbin/backupmng psaadm 2771 0.0 0.0 533072 159752 ? SN Jan22 0:18 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/plib/backup/scheduled_backup.php --dump 31 root 2924 0.0 0.0 412780 66932 ? SN Jan22 0:01 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/sbin/backup_agent --dump -domains-id -owner-guid cd87a31c-3316-4faf-be11-29f9287f60da -owner-type client -description-file /usr/local/psa/PMM/sessions/2022-01-22-123507.233/dump_description -compression-level normal -session-path /usr/local/psa/PMM/sessions/2022-01-22-123507.233 -output-file ftp://redacted//Postfaecher/ -ftp-passive-mode -from-file /usr/local/psa/PMM/sessions/2022-01-22-123507.233/from-file -ftp root 3268 0.0 0.0 215736 5360 ? SN Jan22 0:00 /usr/local/psa/admin/sbin/backup-archiver --pack --source=/var/www/vhosts/system/redacted/conf --destination=clients/redacted/backup_conf_2201221235.tzst --session-path=/usr/local/psa/PMM/sessions/2022-01-22-123507.233 --warnings=/tmp/bwsZNSIZs --compression-method=zstd --compression-level=normal --exclude-files=/tmp/befpRvIYJ

In this example you can see

Code:

--source=/var/www/vhosts/system/redacted/conf

This is your subscription (the "redacted" part) that has a faulty backup and is causing the following backups to wait.
4) Login to Plesk GUI and open the backup manager of that subscription. Very likely you'll see error reports there for failed FTP logins. Whether or not does not matter. You want to disable this backup. So enter the backup schedule settings and remove the "active" checkbox checkmark to disable this backup. It does not work anyway, because it cannot login to its configured FTP repository. Save the disabled backup settings.
5) Back on the Linux console kill the pmm processes and the backupmng processes, e.g.
# kill -p <process id>
Example:
# kill -p 2749
# kill -p 2757
# kill -p 2770
# kill -p 2771
# kill -p 2924
# kill -p 3268
while the last two will probably already be gone after you killed the first few. The process IDs in this example are the ones taken from the example above. In your own world you need to use the correct own process IDs as they are output from the ps commands.
6) After killing these, verify that either no further pmm and backupmng processes remain in the process list or that new pmm and backupmng processes have started for backups in the queue.

Thanks a lot. I'm going to try this solution, hopefully it works

sebgonzes · Feb 23, 2022

Still waiting an solution to this bug! The solution of @Peter Debik is only an temporal solution and pmmcli_daemon appear as stuck every day!
@plesk team, we need really more serious about the update control and bug solution!

ArnauA · Feb 23, 2022

sebgonzes said:
Still waiting an solution to this bug! The solution of @Peter Debik is only an temporal solution and pmmcli_daemon appear as stuck every day!
@plesk team, we need really more serious about the update control and bug solution!

Hello @sebgonzes.
I used the guide provided by @Peter Debik. At the moment it has been working correctly for a week without errors. Check what you mention about the processes. I hope you can solve.
@plesk I hope you can provide a solution that gives solutions to all people

sebgonzes · Feb 23, 2022

ArnauA said:
Hello @sebgonzes.
I used the guide provided by @Peter Debik. At the moment it has been working correctly for a week without errors. Check what you mention about the processes. I hope you can solve.
@plesk I hope you can provide a solution that gives solutions to all people

For us, it's works at the moment that we kill process, but then, few days later, process appear stuck and block again any backup, we have create a temporally script to kill the process before our backups, it's work but it's really an undesirable solution.

Bitpalast · Feb 23, 2022

You need to identify the hanging backup and disable it. If it hangs, it cannot backup the data correctly anyway, because it cannot connect the configured FTP storage space. Once you disable the backup, it won't run, hence no connection attempt to a malfunctioning FTP login occurs, hence it won't hang. See my post from above how to find the hanging backup.

ArnauA · Feb 23, 2022

sebgonzes said:
For us, it's works at the moment that we kill process, but then, few days later, process appear stuck and block again any backup, we have create a temporally script to kill the process before our backups, it's work but it's really an undesirable solution.

Try reconfiguring the connection to the FTP or whatever service you use to save backups.
I use OneDrive Business and reconfigured it again, maybe that helped me not to crash again.

if there is anything you don't understand I'm sorry I'm not very good at English.

sebgonzes · Feb 23, 2022

Peter Debik said:
You need to identify the hanging backup and disable it. If it hangs, it cannot backup the data correctly anyway, because it cannot connect the configured FTP storage space. Once you disable the backup, it won't run, hence no connection attempt to a malfunctioning FTP login occurs, hence it won't hang. See my post from above how to find the hanging backup.

Efectively I suppose it's due to some FTP client backup, but I can't/don't want disable them.. I can't deprive them about this funcionality.... Plesk team should solve it as it works before.

ArnauA · Feb 23, 2022

sebgonzes said:
Efectively I suppose it's due to some FTP client backup, but I can't/don't want disable them.. I can't deprive them about this funcionality.... Plesk team should solve it as it works before.

Try, as I said, to configure it again.

Bitpalast · Feb 23, 2022

sebgonzes said:
Efectively I suppose it's due to some FTP client backup, but I can't/don't want disable them.. I can't deprive them about this funcionality.... Plesk team should solve it as it works before.

Of course the bug needs to be fixed. But if a client backup causes the error, then that backup does not work anyway. It just makes no difference whether you disable it or whether it tries to run and crashes. Better let the client know that his FTP account does not work right.

Bitpalast · Mar 1, 2022

Update 18.0.42:
Scheduled backup processes no longer get stuck indefinitely when backing up to FTP storage if the configured FTP server is unavailable or if the provided credentials do not match. (PPPM-13411)

D4NY · Mar 3, 2022

Absolutely useful. Thank you Peter.
Anyway now manually updated to 18.0.42.
Hope this never happen again.

D4NY · Dec 24, 2022

Here we are again... just before Christmas. I'm running a new dedicated server since september without problems. On this server i've 70 websites, none of them is so heavy (max 4gb and most of them 100-200mb), and the daily backups are scheduled from 0.00 to about 9.00, two backup every 15 minutes. So the first 2 websites are scheduled at 0.00, other 2 at 0.15 and so on.... I had no problem until yesterday, the only thing is that the time of the backup is often different from the scheduled one but i think because of the low priority i set up in the settings.

Now i found that backup are stuck and most of them are not done. I receive some alert like this:

pmm-ras failed (Error code = 1): Repository error: Failed to read backup backup_XXXXXXXXX.it_2212191415.tar: Curl error: Unable to resume an interrupted download: (56) Failure when receiving data from the peer: Last FTP request: RETR backup_XXXXXXXX.it_2212191415.tar: Last FTP response: 150 Opening BINARY mode data connection for backup_XXXXXXXXX.it_2212191415.tar (4814640 bytes): Connection to the FTP server has lost

FTP network error

via ssh i run # ps aux | grep backup and i got something like this

root 14253 0.0 0.0 307184 22724 ? SN 10:32 0:01 /usr/local/psa/admin/bin/pmm-ras --get-ftp-dump-list --dump-storage=ftp://ZZZZZZZZZZ/ --use-ftp-passive-mode --type=domain --name=YYYYYYYYY.it --guid=a2a0c893-0aad-4d3b-a4d8-ff73398983c0 --session-path=/var/log/plesk/PMM
psaadm 19234 0.0 0.0 58976 1136 ? S 11:12 0:00 /usr/local/psa/admin/sbin/backupmng
psaadm 19235 0.0 0.1 387200 40636 ? SN 11:12 0:00 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/plib/backup/scheduled_backup.php --dump 3
root 19722 0.1 0.1 273948 34080 ? SN 11:15 0:01 /usr/bin/python3 -Estt /usr/local/psa/admin/sbin/pmmcli --pmmras-exec /usr/local/psa/tmp/pree2b9tQ --get-ftp-dump-list --dump-storage=ftp://ZZZZZZZZZZ/ --use-ftp-passive-mode --type=domain --name=FFFFFFFFFFF.it --guid=cc90d635-7412-4224-96bd-3cea8124d2dc --session-path=/var/log/plesk/PMM
root 19725 0.0 0.0 308032 23116 ? SN 11:15 0:00 /usr/local/psa/admin/bin/pmm-ras --get-ftp-dump-list --dump-storage=ftp://ZZZZZZZZZZ/ --use-ftp-passive-mode --type=domain --name=FFFFFFFFFFF.it --guid=cc90d635-7412-4224-96bd-3cea8124d2dc --session-path=/var/log/plesk/PMM
root 21020 0.0 0.0 113284 1212 ? Ss 00:30 0:00 /bin/sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1
psaadm 21021 0.0 0.0 58976 3868 ? S 00:30 0:00 /usr/local/psa/admin/sbin/backupmng
root 21067 0.0 0.0 112816 976 pts/0 S+ 11:26 0:00 grep --color=auto backup

I've tried to kill process but immediatly after the situation is tha same, maybe with some other website backup stuck.
I've also rebooted the server but no help.

The server is running well, no overload, website are running, everything ok but not the backups.
Nothing changed on the FTP storage that is accessible via Ftp client and it seems to be fast.

Here is the Backup Manger options settings:
Maximum number of concurrently scheduled backup jobs running: 1
Run low priority scheduled backup procedures ON - 10 and 7
Run all backup jobs with low priority ON - 10 and 7
Compression: none
Start the backup only if the server has enough free disk space ON

I don't know what happened suddenly. Is it possible that in the way i've scheduled the backups there are too many in short time and the queue is too long? Is it better to schedule a single website backup for every 15 minutes (usually it takes less than a minute to complete)? Normal priority can help? Or is it a new bug?

Any suggestion or help is welcome. Happy Christmas.

Peter Debik · Dec 24, 2022

Please change

Maximum number of concurrently scheduled backup jobs running: 1

to at least 2. I've seen similar behavior as you describe. The problem is that when more jobs are running and not enough concurrent jobs are allowed, jobs get queued and are not executed at the required time. This can lead to the impression that they are not executed at all, but they are simply waiting a day, two, three days maybe. However, if too many such jobs exist, the wait is longer and longer.

I found that on servers with many users it is best to allow at least 2 but no more than 4 concurrent jobs.

D4NY · Dec 24, 2022

I'm trying this immediatly and i'll update this thread.

Thank you Peter

D4NY · Dec 24, 2022

I've set up like this

so 4 concurrent jobs and medium priority.... since then some backup has been completed...

...but the feeling is that it's still stuck. The server is very fast to do everything but when i open the backup manager it's slow.

Is there a way to clean the backup queue and restart from zero? I mean, a sort of "postsuper -d ALL" for the mail queue?

Forwarded to devs Recurring, intermittent backup process stuck since update to 18.0.41 on three independent systems

Plesk addicted!

Regular Pleskian

New Pleskian

Plesk addicted!

Silver Pleskian

New Pleskian

Silver Pleskian

New Pleskian

Silver Pleskian

Plesk addicted!

New Pleskian

Silver Pleskian

New Pleskian

Plesk addicted!

Plesk addicted!

Regular Pleskian

Regular Pleskian

Community Manager until 3/2024

Regular Pleskian

Regular Pleskian

Similar threads