• Please be aware: Kaspersky Anti-Virus has been deprecated
    With the upgrade to Plesk Obsidian 18.0.64, "Kaspersky Anti-Virus for Servers" will be automatically removed from the servers it is installed on. We recommend that you migrate to Sophos Anti-Virus for Servers.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Forwarded to devs Recurring, intermittent backup process stuck since update to 18.0.41 on three independent systems

Bitpalast

Plesk addicted!
Plesk Guru
Username:

TITLE

Recurring, intermitten backup process stuck since update to 18.0.41 on three independent systems

PRODUCT, VERSION, OPERATING SYSTEM, ARCHITECTURE

Obsidian 18.0.41 latest MU
CentOS 7.9

PROBLEM DESCRIPTION

As described in Issue - Automated backup stopped on two servers since 18.0.41, recurring

The automated backup process is not starting or running for unknown reason. No specific log entries. When looking into the Linux process list, some processes with "backup" in their name reside there, but don't seem to be doing anything.

Initially the issue occured on two machines when the upgrade was done to 18.0.41, but after we have removed the hanging processes, it has re-occured on these two machines and now a third one where now upgrade process has been done while the backup was running.

STEPS TO REPRODUCE

Cannot be reproduced manually. It occured on a small number of machines, not all, but we don't know how it could be "manually" reproduced, because manual backup processes are running while automated are not.

ACTUAL RESULT

As described above and in Issue - Automated backup stopped on two servers since 18.0.41, recurring

EXPECTED RESULT

Automated backup processes run as they are scheduled. No hanging "backup" processes in the process list.

ANY ADDITIONAL INFORMATION

(DID NOT ANSWER QUESTION)

YOUR EXPECTATIONS FROM PLESK SERVICE TEAM

Confirm bug
 
Good morning.
The same thing is happening to us. Backups fail without showing any error or log.
What workaround can we use until the update is pushed out by Plesk that resolves the PPPM-13411 error?
 
1) Login with SSH.
2) run
# ps aux | grep pmm
and
# ps aux | grep backup
and copy the output to an editor for future reference.
3) Search the output for the domain/subscription name that the current hanging backup process is processing.
Example:
Code:
root      2749  0.0  0.0 113280  1204 ?        Ss   Jan22   0:00 /bin/sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1
psaadm    2757  0.0  0.0  58860  3848 ?        S    Jan22   0:00 /usr/local/psa/admin/sbin/backupmng
psaadm    2770  0.0  0.0  58992  1112 ?        S    Jan22   0:07 /usr/local/psa/admin/sbin/backupmng
psaadm    2771  0.0  0.0 533072 159752 ?       SN   Jan22   0:18 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/plib/backup/scheduled_backup.php --dump 31
root      2924  0.0  0.0 412780 66932 ?        SN   Jan22   0:01 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/sbin/backup_agent --dump -domains-id -owner-guid cd87a31c-3316-4faf-be11-29f9287f60da -owner-type client -description-file /usr/local/psa/PMM/sessions/2022-01-22-123507.233/dump_description -compression-level normal -session-path /usr/local/psa/PMM/sessions/2022-01-22-123507.233 -output-file ftp://redacted//Postfaecher/ -ftp-passive-mode -from-file /usr/local/psa/PMM/sessions/2022-01-22-123507.233/from-file -ftp
root      3268  0.0  0.0 215736  5360 ?        SN   Jan22   0:00 /usr/local/psa/admin/sbin/backup-archiver --pack --source=/var/www/vhosts/system/redacted/conf --destination=clients/redacted/backup_conf_2201221235.tzst --session-path=/usr/local/psa/PMM/sessions/2022-01-22-123507.233 --warnings=/tmp/bwsZNSIZs --compression-method=zstd --compression-level=normal --exclude-files=/tmp/befpRvIYJ
In this example you can see
Code:
--source=/var/www/vhosts/system/redacted/conf
This is your subscription (the "redacted" part) that has a faulty backup and is causing the following backups to wait.
4) Login to Plesk GUI and open the backup manager of that subscription. Very likely you'll see error reports there for failed FTP logins. Whether or not does not matter. You want to disable this backup. So enter the backup schedule settings and remove the "active" checkbox checkmark to disable this backup. It does not work anyway, because it cannot login to its configured FTP repository. Save the disabled backup settings.
5) Back on the Linux console kill the pmm processes and the backupmng processes, e.g.
# kill -p <process id>
Example:
# kill -p 2749
# kill -p 2757
# kill -p 2770
# kill -p 2771
# kill -p 2924
# kill -p 3268
while the last two will probably already be gone after you killed the first few. The process IDs in this example are the ones taken from the example above. In your own world you need to use the correct own process IDs as they are output from the ps commands.
6) After killing these, verify that either no further pmm and backupmng processes remain in the process list or that new pmm and backupmng processes have started for backups in the queue.
 
Thanks a lot @Peter Debik for solution, we have the case on 5 servers (15 without recient backup OMG!)... Plesk team, it's not admissible this bug type! moreover when you justify 35% of increase price each year!
 
1) Login with SSH.
2) run
# ps aux | grep pmm
and
# ps aux | grep backup
and copy the output to an editor for future reference.
3) Search the output for the domain/subscription name that the current hanging backup process is processing.
Example:
Code:
root      2749  0.0  0.0 113280  1204 ?        Ss   Jan22   0:00 /bin/sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1
psaadm    2757  0.0  0.0  58860  3848 ?        S    Jan22   0:00 /usr/local/psa/admin/sbin/backupmng
psaadm    2770  0.0  0.0  58992  1112 ?        S    Jan22   0:07 /usr/local/psa/admin/sbin/backupmng
psaadm    2771  0.0  0.0 533072 159752 ?       SN   Jan22   0:18 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/plib/backup/scheduled_backup.php --dump 31
root      2924  0.0  0.0 412780 66932 ?        SN   Jan22   0:01 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/sbin/backup_agent --dump -domains-id -owner-guid cd87a31c-3316-4faf-be11-29f9287f60da -owner-type client -description-file /usr/local/psa/PMM/sessions/2022-01-22-123507.233/dump_description -compression-level normal -session-path /usr/local/psa/PMM/sessions/2022-01-22-123507.233 -output-file ftp://redacted//Postfaecher/ -ftp-passive-mode -from-file /usr/local/psa/PMM/sessions/2022-01-22-123507.233/from-file -ftp
root      3268  0.0  0.0 215736  5360 ?        SN   Jan22   0:00 /usr/local/psa/admin/sbin/backup-archiver --pack --source=/var/www/vhosts/system/redacted/conf --destination=clients/redacted/backup_conf_2201221235.tzst --session-path=/usr/local/psa/PMM/sessions/2022-01-22-123507.233 --warnings=/tmp/bwsZNSIZs --compression-method=zstd --compression-level=normal --exclude-files=/tmp/befpRvIYJ
In this example you can see
Code:
--source=/var/www/vhosts/system/redacted/conf
This is your subscription (the "redacted" part) that has a faulty backup and is causing the following backups to wait.
4) Login to Plesk GUI and open the backup manager of that subscription. Very likely you'll see error reports there for failed FTP logins. Whether or not does not matter. You want to disable this backup. So enter the backup schedule settings and remove the "active" checkbox checkmark to disable this backup. It does not work anyway, because it cannot login to its configured FTP repository. Save the disabled backup settings.
5) Back on the Linux console kill the pmm processes and the backupmng processes, e.g.
# kill -p <process id>
Example:
# kill -p 2749
# kill -p 2757
# kill -p 2770
# kill -p 2771
# kill -p 2924
# kill -p 3268
while the last two will probably already be gone after you killed the first few. The process IDs in this example are the ones taken from the example above. In your own world you need to use the correct own process IDs as they are output from the ps commands.
6) After killing these, verify that either no further pmm and backupmng processes remain in the process list or that new pmm and backupmng processes have started for backups in the queue.
Thanks a lot. I'm going to try this solution, hopefully it works
 
Still waiting an solution to this bug! The solution of @Peter Debik is only an temporal solution and pmmcli_daemon appear as stuck every day!
@plesk team, we need really more serious about the update control and bug solution!
 
Still waiting an solution to this bug! The solution of @Peter Debik is only an temporal solution and pmmcli_daemon appear as stuck every day!
@plesk team, we need really more serious about the update control and bug solution!
Hello @sebgonzes.
I used the guide provided by @Peter Debik. At the moment it has been working correctly for a week without errors. Check what you mention about the processes. I hope you can solve.
@plesk I hope you can provide a solution that gives solutions to all people
 
Hello @sebgonzes.
I used the guide provided by @Peter Debik. At the moment it has been working correctly for a week without errors. Check what you mention about the processes. I hope you can solve.
@plesk I hope you can provide a solution that gives solutions to all people
For us, it's works at the moment that we kill process, but then, few days later, process appear stuck and block again any backup, we have create a temporally script to kill the process before our backups, it's work but it's really an undesirable solution.
 
You need to identify the hanging backup and disable it. If it hangs, it cannot backup the data correctly anyway, because it cannot connect the configured FTP storage space. Once you disable the backup, it won't run, hence no connection attempt to a malfunctioning FTP login occurs, hence it won't hang. See my post from above how to find the hanging backup.
 
For us, it's works at the moment that we kill process, but then, few days later, process appear stuck and block again any backup, we have create a temporally script to kill the process before our backups, it's work but it's really an undesirable solution.
Try reconfiguring the connection to the FTP or whatever service you use to save backups.
I use OneDrive Business and reconfigured it again, maybe that helped me not to crash again.

if there is anything you don't understand I'm sorry I'm not very good at English.
 
You need to identify the hanging backup and disable it. If it hangs, it cannot backup the data correctly anyway, because it cannot connect the configured FTP storage space. Once you disable the backup, it won't run, hence no connection attempt to a malfunctioning FTP login occurs, hence it won't hang. See my post from above how to find the hanging backup.
Efectively I suppose it's due to some FTP client backup, but I can't/don't want disable them.. I can't deprive them about this funcionality.... Plesk team should solve it as it works before.
 
Efectively I suppose it's due to some FTP client backup, but I can't/don't want disable them.. I can't deprive them about this funcionality.... Plesk team should solve it as it works before.
Of course the bug needs to be fixed. But if a client backup causes the error, then that backup does not work anyway. It just makes no difference whether you disable it or whether it tries to run and crashes. Better let the client know that his FTP account does not work right.
 
Update 18.0.42:
Scheduled backup processes no longer get stuck indefinitely when backing up to FTP storage if the configured FTP server is unavailable or if the provided credentials do not match. (PPPM-13411)
 
Absolutely useful. Thank you Peter.
Anyway now manually updated to 18.0.42.
Hope this never happen again.
 
Here we are again... just before Christmas. I'm running a new dedicated server since september without problems. On this server i've 70 websites, none of them is so heavy (max 4gb and most of them 100-200mb), and the daily backups are scheduled from 0.00 to about 9.00, two backup every 15 minutes. So the first 2 websites are scheduled at 0.00, other 2 at 0.15 and so on.... I had no problem until yesterday, the only thing is that the time of the backup is often different from the scheduled one but i think because of the low priority i set up in the settings.

Now i found that backup are stuck and most of them are not done. I receive some alert like this:
pmm-ras failed (Error code = 1): Repository error: Failed to read backup backup_XXXXXXXXX.it_2212191415.tar: Curl error: Unable to resume an interrupted download: (56) Failure when receiving data from the peer: Last FTP request: RETR backup_XXXXXXXX.it_2212191415.tar: Last FTP response: 150 Opening BINARY mode data connection for backup_XXXXXXXXX.it_2212191415.tar (4814640 bytes): Connection to the FTP server has lost
FTP network error
via ssh i run # ps aux | grep backup and i got something like this
root 14253 0.0 0.0 307184 22724 ? SN 10:32 0:01 /usr/local/psa/admin/bin/pmm-ras --get-ftp-dump-list --dump-storage=ftp://ZZZZZZZZZZ/ --use-ftp-passive-mode --type=domain --name=YYYYYYYYY.it --guid=a2a0c893-0aad-4d3b-a4d8-ff73398983c0 --session-path=/var/log/plesk/PMM
psaadm 19234 0.0 0.0 58976 1136 ? S 11:12 0:00 /usr/local/psa/admin/sbin/backupmng
psaadm 19235 0.0 0.1 387200 40636 ? SN 11:12 0:00 /usr/bin/sw-engine -c /usr/local/psa/admin/conf/php.ini /usr/local/psa/admin/plib/backup/scheduled_backup.php --dump 3
root 19722 0.1 0.1 273948 34080 ? SN 11:15 0:01 /usr/bin/python3 -Estt /usr/local/psa/admin/sbin/pmmcli --pmmras-exec /usr/local/psa/tmp/pree2b9tQ --get-ftp-dump-list --dump-storage=ftp://ZZZZZZZZZZ/ --use-ftp-passive-mode --type=domain --name=FFFFFFFFFFF.it --guid=cc90d635-7412-4224-96bd-3cea8124d2dc --session-path=/var/log/plesk/PMM
root 19725 0.0 0.0 308032 23116 ? SN 11:15 0:00 /usr/local/psa/admin/bin/pmm-ras --get-ftp-dump-list --dump-storage=ftp://ZZZZZZZZZZ/ --use-ftp-passive-mode --type=domain --name=FFFFFFFFFFF.it --guid=cc90d635-7412-4224-96bd-3cea8124d2dc --session-path=/var/log/plesk/PMM
root 21020 0.0 0.0 113284 1212 ? Ss 00:30 0:00 /bin/sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1
psaadm 21021 0.0 0.0 58976 3868 ? S 00:30 0:00 /usr/local/psa/admin/sbin/backupmng
root 21067 0.0 0.0 112816 976 pts/0 S+ 11:26 0:00 grep --color=auto backup
I've tried to kill process but immediatly after the situation is tha same, maybe with some other website backup stuck.
I've also rebooted the server but no help.

The server is running well, no overload, website are running, everything ok but not the backups.
Nothing changed on the FTP storage that is accessible via Ftp client and it seems to be fast.

Here is the Backup Manger options settings:
Maximum number of concurrently scheduled backup jobs running: 1
Run low priority scheduled backup procedures ON - 10 and 7
Run all backup jobs with low priority ON - 10 and 7
Compression: none
Start the backup only if the server has enough free disk space ON

I don't know what happened suddenly. Is it possible that in the way i've scheduled the backups there are too many in short time and the queue is too long? Is it better to schedule a single website backup for every 15 minutes (usually it takes less than a minute to complete)? Normal priority can help? Or is it a new bug?

Any suggestion or help is welcome. Happy Christmas.
 
Please change

Maximum number of concurrently scheduled backup jobs running: 1

to at least 2. I've seen similar behavior as you describe. The problem is that when more jobs are running and not enough concurrent jobs are allowed, jobs get queued and are not executed at the required time. This can lead to the impression that they are not executed at all, but they are simply waiting a day, two, three days maybe. However, if too many such jobs exist, the wait is longer and longer.

I found that on servers with many users it is best to allow at least 2 but no more than 4 concurrent jobs.
 
I've set up like this
settings.jpg

so 4 concurrent jobs and medium priority.... since then some backup has been completed...

ftp.jpg
...but the feeling is that it's still stuck. The server is very fast to do everything but when i open the backup manager it's slow.

Is there a way to clean the backup queue and restart from zero? I mean, a sort of "postsuper -d ALL" for the mail queue?
 
Back
Top