Resolved Slow backup to FTP repository from disaster-recovered host

Bitpalast · Jul 23, 2016

CentOS 7.2 64-Bit, Plesk 12.5.30 #40
Fresh install in a disaster recovery scenario.
Existing backup archives in an FTP repository. Restore of customer accounts from the latest backup in that FTP repository possible (after some issues with the signature). Approx. 100 GB file size.

Now trying to create a new full backup to FTP. While this at the same file sizes is no issue on other machines and has not been an issue before on the machine in question, it now all the sudden runs and runs and runs - and it seems that it is actually doing something according to the logs in /usr/local/psa/PMM/logs. But after 10 hours the processes are still running. Many backup processes in the ps aux list.

According to the the GUI, the local backup has been completed. In /var/lib/psa/dumps we see many .tgz files that obviously belong to the backup that is still in the making.

What is causing the super slow backup in this recovered installation? On other servers with the same hardware and same FTP-configuration (other repositories/logins, though) backup is running smoothly, and it always has on this machine before, too. Why is it now so super, super slow doing the same backup to the same repository on the same hardware and same Plesk version? Could this be an issue with existing backup files in the repository? What do we need to do to fix this issue?

Bitpalast · Aug 10, 2016

Solution:
If "symbolic-links=0" is missing from /etc/my.cnf or is misplaced in a [...] section other than [mysqld], all database operations will work for subscriptions, but backup and Plesk software update transactions will fail.

Details:
During the slow-backup-issue, a huge number of plesk_agent_manager_server-processes like this example were found in the process list:

"/usr/bin/perl /usr/local/psa/admin/bin/plesk_agent_manager server --owner-uid=e87d0e45-40a3-477e-ab59-f07a2aab9914 --owner-type=server --dump-rotation=14 --description=Scheduled Backup --keep-local-backup --session-path=/usr/local/psa/PMM/sessions/2016-07-23-035802.350 --output-file=ftp://foo@bar/"

These could easily be removed by

pkill -f "/usr/bin/perl /usr/local/psa/admin/bin/plesk_agent_manager server --owner-uid=e87d0e45-40a3-477e-ab59-f07a2aab9914 --owner-type=server --dump-rotation=14 --description=Scheduled Backup --keep-local-backup --session-path=/usr/local/psa/PMM/sessions/2016-07-23-035802.350 --output-file=ftp://foo@bar/"

but came back on additional attempts to backup the server via Plesk backup functions.

Error logs of PMM showed a huge number of failed MySQL backups (all MySQL backups failed) like this example:

"Hosting on domain foo.bar is skipped from backup due to error: Can't call method "dumpCmdline" without a package or object reference at /usr/local/psa/PMM/agents/shared/Db/Connection.pm line 115.Bad file descriptor: dup2( 17, 13 ) at /usr/local/psa/PMM/agents/shared/Storage/Storage.pm line 343. at /usr/local/psa/PMM/agents/shared/Storage/Storage.pm line 343."

These messages were not easy to find in the enormous number of lines in the PMM logs. They could also be reproduced by trying to back up single user subscriptions that were using at least one MySQL database.

The "dup2()" error set off a high alert, because dup2() is a system call to copy files. As we did see issues on the RAID system before we assumed that there may be issues with writing to certain clusters/blocks on a disk (as the dumps are written to the same location in the file system) or a RAID controller failure. As a preliminary precaution customers residing on the host were migrated to other hosts. However, we later found out that the assumption was wrong and RAID was perfectly alright.
https://kb.plesk.com/en/119835 (where the "dumpCmdlin"-issue is discussed) did not resolve the issue either.

At the same time we were also seeing a Plesk upgrade to the latest patch version fail because of an unknown MySQL variable "symbolic-links".

Solution:
Checking /etc/my.cnf we saw that "symbolic-links=0" was present, but it was not in the [mysqld] section as we had erroneously inserted another bracket [...] before that line. This had caused MariaDB/MySQL to ignore the setting, not knowing about this variable. This "missing" symbolic-links variable did not impact regulare database operations, but has caused the dup2() error and the Plesk update error.

After the symbolic-links entry in /etc/my.cnf was restored to the [mysqld] section, Plesk could be updated properly. A full backup could also be done successfully.

Resolved Slow backup to FTP repository from disaster-recovered host

Bitpalast

Plesk addicted!

Bitpalast

Plesk addicted!

Similar threads