Resolved pigz on backups frequently overloading cpu usage

Bitpalast · Nov 15, 2019

This is an issue that's been ongoing for years. I am striving to get rid of it:

On scheduled or manual backups, when the pigz program zips the files, it occupies all cpu cores and does not leave room for other programs. This leads to failures of Apache (due to timeouts) on some occasions.

I am seeking a way to limit the cpu usage of that single program. I have thought about cgroups, but it does not seem to be suitable, because pigz runs as root and we cannot (don't want to) limit root in cgroups.

Does anyone know how to deal with this issue? Any ideas?

Brujo · Nov 17, 2019

Hey @Peter Debik have you already tested the suggestion from: High CPU wait, I/O and Load Average during backup or outbound migration

Bitpalast · Nov 17, 2019

Yes, its all running "low priority". The problem is, it is still occupying several cores, so when customer websites or other processes create a lot of load, the headroom decreases. In some occasions we are seeing twice the load of what the cpu can normally handle. We will probably need to switch to uncompressed backup to solve this, but this will drive the backup space costs up and increase upload and download times. Or decrease the number of customer accounts per machine. Difficult decision.

Bitpalast · Nov 23, 2019

It seems that one part of the problem is that at times two or more backups are running simultaneously. Obviously it took me four years to discover only now that the number of simultaneous backups can be limited by GUI setting. I am now trying "1" to see whether we can obtain priority for the full server backup over individual customer backups that are scheduled for hours while our full backup is still running.

Bitpalast · Nov 24, 2019

I further found out that on systems that have 1/3 regular load, using compression on backups can lead to enormous load. Now trying to disable compression altogether. We'll see what will happen then, but probably it's the only solution to avoid falling victim to the pigz cpu load exaggeration.

WebHostingAce · Nov 25, 2019

Hi Peter,

How about your disk write speed. I have faced a similar situation with a server had a very low disk write speed. As I remember it was 50MB/s with dd.

Bitpalast · Nov 25, 2019

@AusWeb: That is definitely something to check into. However, in this case, I already did, and the write speed is fine, tested with direct writing to files (using dd). We are using LSI hardware raid and write back, so normally even when many customer transactions are going on, data is still written almost as fast as on a test machine where there is no load.

Last night I have had the first production run with uncompressed backup, and it worked very well. However, that was an increment. I'll see what the full backup does during this week.

mow · Nov 27, 2019

Have you tried messing with pigz's command-line parameters?
To do so, you could move /usr/bin/pigz elsewhere and in its place put a file that calls the new location of pigz with additional parameters, followed by "$@".
E.g., /usr/bin/real-pigz -1 -p 4 "$@"

Bitpalast · Nov 29, 2019

I had to learn the hard way that one underlying undiscovered issue on the affected machine that showed an excessive cpu load when pigz was used was, that the cpu cooler was defective. Thus the cpu overheated, which causes auto-throttling of the frequency. This again makes a 12-core cpu behave like a 1- or 2-core cpu. This leads to a different cpu load display on commands like "top" or "uptime". So the load that pigz was causing was not higher than the load it is usually causing, but as the cpu has decreased its speed, the load appeared to become higher and higher. The temperature issue was reported in /var/log/messages, but I did not take into account that I'd need to check this aspect of hardware operations during analyzing what is going on. Only by accident I stumbled across a temperature warning message.

Anyway, I also found out that an excellent way to reduce the overall load during backups is to turn pigz of (compression) through the Plesk interface on the backup settings. The only disadvantage this has seems to be that the backup needs more storage space. But it is not slower. It is in times even faster, depending on the transport speed of the connection to the external storage. The time and cpu load saved when omitting compression seems to be very well worth the larger files that are created. The customer experience during backups is also better, because customers won't notice any impact on their website speed by the nightly backups. So currently I believe that this is the better solution for systems that host many customers.

websavers · Jan 10, 2020

I'm glad you found a workaround for this, but isn't it still important that Plesk devs address the issue for those that *do* want to utilize backup compression? Or even from the perspective of storage optimization where compression is essential...

We've encountered high load frequently because of Plesk backup processes entirely due to pigz. iotop shows it's not disk speed or iops that's the limiting factor, but rather the CPU cores being entirely maxed out.

I think it's more than reasonable to ask Plesk devs to auto-scale this accordingly by running pigz with the --processes n parameter and setting n to something like half the total number of cores on the server (or make that number configurable in the admin backup settings). Or even better, have it auto-pause if load gets too heavy and resume when load drops.

Bitpalast · Jan 11, 2020

Plesk DOES limit pigz when you set that in the backup configuration. From what I see in load increase when it is running, I think it is limited to probably something around 25% of the total cpu power. The problem with it is, that when the system is under high load elsewise, the limit might not be tight enough. But in that case it does not make much sense to limit pigz's cpu use more. It is then better to turn it of completely.

websavers · Jan 11, 2020

Peter Debik said:
Plesk DOES limit pigz when you set that in the backup configuration. From what I see in load increase when it is running, I think it is limited to probably something around 25% of the total cpu power. The problem with it is, that when the system is under high load elsewise, the limit might not be tight enough. But in that case it does not make much sense to limit pigz's cpu use more. It is then better to turn it of completely.

I think that turning it off makes sense only if the server's load is *consistently* high due to other processes. However in most real-world scenarios, the server load might only be high during a backup a small portion of the time... in such a case, it *does* make more sense for Plesk to be smart about this, detect the higher load, and pause the processing until load has dropped. Or, allow for better parallelization by dropping the number of cores it uses. Using *all* the cores means it is more likely to cause an dramatic increase in load as long as there's something else happening on the server.

Bitpalast · Jan 11, 2020

Sounds like a good idea. There are several pigz starts during a backup. Maybe the software could measure the cpu load immediately before the start of a pigz instance and adjust the cpu utilisation accordingly. We are doing a similar thing with timeouts here in server surveillance, and that does work great. Maybe you can suggest this as a feature in Plesk user voice? It is very easy to code, so Plesk could easily add it as a feature with very limited effort but a great benefit for the system.

websavers · Jan 11, 2020

Peter Debik said:
Sounds like a good idea. There are several pigz starts during a backup. Maybe the software could measure the cpu load immediately before the start of a pigz instance and adjust the cpu utilisation accordingly. We are doing a similar thing with timeouts here in server surveillance, and that does work great. Maybe you can suggest this as a feature in Plesk user voice? It is very easy to code, so Plesk could easily add it as a feature with very limited effort but a great benefit for the system.

It is done! Feature Suggestions: Top (1532 ideas) – Your Ideas for Plesk

Bitpalast · Apr 15, 2020

Now in Obsidian 18.0.26:
Added the ability to set priority for backup processes. Go to Tools & Settings > Backup Manager (under “Tools & Resources”), click “Settings” and look for the “Run scheduled backup processes with low priority” and “Run all backup processes with low priority” options. (PPPM-10734)

D4NY · Apr 27, 2020

Hello Peter, hello everyone. Same problem here.

This morning at 4.41 the server was down, when i woke up i immediatly started to search the problem. The TOP command was like that:

The only way to have control of the server is to stop mysql service and everything back to normal in few minutes.
I've tried to set low priority but nothing changed.... then i find this thread and in the next hour ill try the No Compression option....
Peter you had some problem using it (apart space occupation)?

I've never had problems with this type of backup (apart mail error also when backup is correctly done).... why are they running all together now?

Bitpalast · Apr 28, 2020

You can try to limit the number of concurrent backup processes in the backup manager. From experience I recommend to NOT to lower it to 1, but 2 or higher will do.

D4NY · Apr 28, 2020

Is there a reason for that?

Bitpalast · Apr 28, 2020

For what exactly?

D4NY · Apr 28, 2020

You were talking about the number of concurrent backup processes... you said not lower it to 1 (I did)

Resolved pigz on backups frequently overloading cpu usage

Plesk addicted!

Silver Pleskian

Plesk addicted!

Plesk addicted!

Plesk addicted!

Silver Pleskian

Plesk addicted!

Silver Pleskian

Plesk addicted!

Regular Pleskian

Plesk addicted!

Regular Pleskian

Plesk addicted!

Regular Pleskian

Plesk addicted!

Regular Pleskian

Plesk addicted!

Regular Pleskian

Plesk addicted!

Regular Pleskian

Similar threads