Resolved pigz on backups frequently overloading cpu usage

Bitpalast · Apr 28, 2020

When you limit the number of concurrent backup processes to 1, only 1 process will run at any specified time, forcing other processes that might be configured to run at similar times, to wait until its completion. If you have a second, third, fourth backup configuration, e.g. in individual domain settings, these will wait on one another and no longer run at their specified time. This traffic jam can lead to a situation where the queue becomes so long that on the next day all the backups scheduled on the previous day are still not finished, so that the next day starts with a delay and following backups are delayed even further. This will lead to a continuously growing queue of backups. You will have not control over the actual time of backup execution.

Normally, on a system where you have a central (administrator) backup and customers on the machine who do their own backups, a value of "2" already resolves this issue, because most systems don't have so many backup jobs that a value of 2 can lead to a similar situation. I am just say that "1" ist likely to lead into issues much more than a higher value like "2".

D4NY · Apr 28, 2020

Thank you for your help and for explanation. On my server i don't have so many backup running to be involved in a problem like the one you have described. Furthermore after disabling the compression it seems that the process is very fast also for FTP backups like mines.

mow · May 6, 2020

D4NY said:
I've never had problems with this type of backup (apart mail error also when backup is correctly done).... why are they running all together now?

If it worked all the time until now, you might have the same problem as Peter. Have you checked whether your CPU runs hot? (Try 'sensors', you might have to install it and run 'sensors-detect' if you haven't yet)
Our server had a problem with the CPU fan as well a while ago. Though the backups ran fine and took just a few minutes longer, when hit by crawlers the machine would slow down to a crawl (pun not intended). Sensors would show the cores having rather normal temperatures of ~40°C in idle but reaching 80°C pretty fast under even light load (and subsequent thermal throttling) when it shouldn't exceed 72°C with all cores used to the max.

By the way, does Plesk 18 have a server monitor that warns when temperatures or SMART go bad?

damianba · May 11, 2020

same here with pigs, but it wasn't like that before. So either some update around pigs package itself or Plesk update?

B_P · Jul 5, 2020

Since a few updates, I do ahve the same issues with backups (to a remote server, either S3 or FTPS). While the incremental backup does not seem to be the issue, the full backup always is troublesome. First of all, I have the impression that it takes much longer than in the past (more than 12 hours for ~140 GB) and second the associated load (which regularly crashes Dr. Web, makes e-mail deliverable fragile, and often prevents Wordpress pages from loading properly).
See attached an example of the CPU usage, memory usage, and system load during backups. Actually, in this specific case, I had to interrupt 2 backups because they were killing all other server activities. What I find interesting is the high percentage of cpu_wait during the backup.

Some details about the system where I observed this:

Backup settings:
- Priority for scheduled backups (and for all backups): 15
- E/A priority (both): 7
- I tried both compressed & uncompressed backups.
System:
- 2 dedicated cores (Intel® Xeon® E5-2680V4)
- 6 GB RAM
- 320 GB HDD (SAS), ~160 GB used
- Network: 1 GBit/s
- Ubuntu 16.04.6
- Plesk 18.0.27 Update #1
- Apache + NGINX, Postfix, Dr. Web, MySQL
- 11 Subscriptions, 136 GB of storage used

Overall, this situation is not really satisfactory and it leaves some open comments / questions. Maybeo @IgorG has some helpful pointers as well?

Is the described behavior reproducible or what one would expect with regard to
- Backup duration (I do not recall that backups in the past took longer than 6 hours)
- CPU load (this was not that much of an issue in the past)?
What is annoying is that there is no proper way to quickly (!) and properly stopping running backup tasks:
- The explanation of How to cancel a stuck task in Plesk does not seem to apply to backups as they do not seem to be listed in that corresponding table.
- Within the control panel, I can perform a click on a running backup to see the details (including a button to terminate the backup). However: When terminating the backup from this screen, it takes multiple minutes until something happens and even when the message showed that the backup should have been completed, some backup processes were running.
- I actually ended up killing processes manually, which always is a hassle with regard to a stable system status and orphaned temporary files.

@IgorG What would help are recommendations on how we can run a reliable backup which does not affect the performance of other services. In addition, official instructions on how to properly and quickly kill a running backup would be helpful.

Could it be that there is a bottleneck elsewhere? I somehow feel it could be related to MySQL...

Bitpalast · Jul 5, 2020

Please check /var/log/messages for CPU temperature warnings or "throttle" messages. It is thinkable that the overall speed of the CPU is decreased so that processes that consume a lot of cpu power are experienced considerably slower than usual.

B_P · Jul 5, 2020

Peter Debik said:
Please check /var/log/messages for CPU temperature warnings or "throttle" messages. It is thinkable that the overall speed of the CPU is decreased so that processes that consume a lot of cpu power are experienced considerably slower than usual.

Well, it is a virtual server, so I cannot check the temperatures...

Bitpalast · Jul 5, 2020

Do you see a chance to contact your hosting company on this? If the backup worked before and suddenly consumes much more time there is only a very few reasons why:
- Another process on the same server consumes a lot of cpu time so that only little is left for the backup.
- The cpu speed has decreased; normally this is due to cooling issues, cpus then auto-deactivate some cores which dramatically decreases the system speed.
- An unusually high number of files need to be backed up; although the overall size of the backup is still not much bigger, the number of files can decrease the backup speed.

B_P · Jul 5, 2020

Peter Debik said:
Do you see a chance to contact your hosting company on this? If the backup worked before and suddenly consumes much more time there is only a very few reasons why:
- Another process on the same server consumes a lot of cpu time so that only little is left for the backup.
- The cpu speed has decreased; normally this is due to cooling issues, cpus then auto-deactivate some cores which dramatically decreases the system speed.
- An unusually high number of files need to be backed up; although the overall size of the backup is still not much bigger, the number of files can decrease the backup speed.

I am awaiting their response...

B_P · Jul 28, 2020

Ok, the server has been moved to a different host with less activity. However, the problems remain.
What I can see is that the backup-related commands such as sw-tar cause a very high I/O load and casue often 99% of all I/O activities.

Bitpalast · Jul 28, 2020

A high I/O activity is expected, because tar needs to read data, compress it, then write it back into a new file.

mow · Aug 11, 2020

B_P said:
Ok, the server has been moved to a different host with less activity. However, the problems remain.
What I can see is that the backup-related commands such as sw-tar cause a very high I/O load and casue often 99% of all I/O activities.

Does the new server also have just a HDD for your stuff? That kills performance, especially when you have lots and lots of small files, then your iowait times will go through the roof.
Can you run iostat (e.g. `iostat 2 5`)?

B_P · Sep 27, 2020

mow said:
Does the new server also have just a HDD for your stuff? That kills performance, especially when you have lots and lots of small files, then your iowait times will go through the roof.
Can you run iostat (e.g. `iostat 2 5`)?

Yes, the server only has an HDD. However, since the backup worked well for the last two years and since there was no larger changes (e.g. related to disk space), the behavior is still odd.
When runnint iostat, iowait is often > 90%.

mow · Sep 30, 2020

B_P said:
Yes, the server only has an HDD.

Then it makes no sense to run parallel compression, as this will lead to several threads doing concurrent I/O which absolutely kills performance on a HDD because seek times. Even worse when backup source and target (/var/lib/psa/dumps/domains/*) are on the same HDD.

B_P · Sep 30, 2020

mow said:
Then it makes no sense to run parallel compression, as this will lead to several threads doing concurrent I/O which absolutely kills performance on a HDD because seek times. Even worse when backup source and target (/var/lib/psa/dumps/domains/*) are on the same HDD.

We disabled compression months ago already, but it did not get much better.

kea · Nov 3, 2020

Same problem here. Plesk Support says to turn off compression. It doesn't help much.

Bitpalast · Nov 3, 2020

Sorry to hear that turning of compression does not solve the problem in your case. However, compression is the only factor that has a significant impact on cpu usage during a backup. All other parts are simply I/O operations. A backup without compression is only combining files into .tar files and uploading them to the storage space. It creates the tars and adds some xml files with additional information about the backup, but these are basically text files in a comparatively small size. Very basic operations that a server is doing all the time.

Should these normal operations still overload your system (compression turned off), you will probably need to talk to your provider about it, because in that case cpu resources or disk i/o speed are lower than they should be.

mow · Nov 3, 2020

Peter Debik said:
However, compression is the only factor that has a significant impact on cpu usage during a backup. All other parts are simply I/O operations.

Nope, the SQL server backup can get pretty intense too. Especially if it has to dump while under heavy traffic with lots of writes.

Bitpalast · Nov 3, 2020

That is because of the disk load, and that again is something that the system administrator must solve.

mow · Nov 3, 2020

That is because mariadb has to keep the dump consistent despite ongoing transactions.
(innobackupex/xtrabackup has a different concept that is more filecopy-oriented and less of a cpu hog.)

Resolved pigz on backups frequently overloading cpu usage

Plesk addicted!

Regular Pleskian

Silver Pleskian

New Pleskian

Regular Pleskian

Plesk addicted!

Regular Pleskian

Plesk addicted!

Regular Pleskian

Regular Pleskian

Plesk addicted!

Silver Pleskian

Regular Pleskian

Silver Pleskian

Regular Pleskian

New Pleskian

Plesk addicted!

Silver Pleskian

Plesk addicted!

Silver Pleskian

Similar threads