• Please be aware: Kaspersky Anti-Virus has been deprecated
    With the upgrade to Plesk Obsidian 18.0.64, "Kaspersky Anti-Virus for Servers" will be automatically removed from the servers it is installed on. We recommend that you migrate to Sophos Anti-Virus for Servers.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Forwarded to devs CGRoups freezes the system - BlockIOWriteBandwidth

B4c4rd1

Regular Pleskian
TITLE:
CGRoups freezes the system - BlockIOWriteBandwidth
PRODUCT, VERSION, OPERATING SYSTEM, ARCHITECTURE:
Plesk Onyx Version 17.8.11 Update #48
Ubuntu 16/18
PROBLEM DESCRIPTION:
Enabling the read and write bandwidth option will cause the entire system to freeze for the read / write operation.

The mistake is not so easy to represent.

The problem was tested with different operating systems and also with different servers.​
STEPS TO REPRODUCE:
Put on a new customer. Place e.g. the CGRoups settings are as follows:

Write speed: 5MB / s
Reading speed: 5MB / s

Log in with the customer via SSH and execute the following command:

Code:
dd if=/dev/zero of=test1.img bs=1G count=1 oflag=dsync

During the time he creates the file, the Plek Panel is not accessible. (Please move back and forth in the Plesk Panel while the task is being performed)

Also, the package manager does not respond in time. For example, during the time with ./plesk-installer no new packages can be installed.​
ACTUAL RESULT:
The system responds as if root is also restricted.​
EXPECTED RESULT:
The system should react as usual.​
ANY ADDITIONAL INFORMATION:
I have already tried to understand why this error occurs. But apparently nobody knows the problem yet.

If interested, I can share the VirtualBox image.
YOUR EXPECTATIONS FROM PLESK SERVICE TEAM:
Help with sorting out
 
Last edited:
Hello, I have a couple of questions:
1. dd shows copy speed in the end (e.g. "33554432 bytes (34 MB) copied, 0.402816 s, 83.3 MB/s"). Can you check that reported speed does not exceed specified limits (5M/s)?
2. "bs=1G" tells dd to read 1G amount of data into memory. How many RAM do you have on your server? (Maybe dd process starts to consume swap or something).
3. Can you check what top or htop shows during dd run (I mean start top/htop first, then run dd in separate console)?
 
Hello, I have a couple of questions:
1. dd shows copy speed in the end (e.g. "33554432 bytes (34 MB) copied, 0.402816 s, 83.3 MB/s"). Can you check that reported speed does not exceed specified limits (5M/s)?
2. "bs=1G" tells dd to read 1G amount of data into memory. How many RAM do you have on your server? (Maybe dd process starts to consume swap or something).
3. Can you check what top or htop shows during dd run (I mean start top/htop first, then run dd in separate console)?

1.: upload_2019-4-23_14-52-37.png
2.: 96GB - 128GB
3.: upload_2019-4-23_14-53-51.png

I have reduce to 100M otherwise it takes too long. The result is the same!

Here the complete system is not frozen, but only a few parts like Plesk panel or the plesk package manager.
 
Well, I cannot reproduce this on test environment (looks like hardware configuration is quite different from yours).
You may try to proceed to Plesk support and provide access to server. In this case our engineers will check the problem.
 
We can reproduce this error do that on several machines with different operating systems and partitions.

Or you may do something wrong?

As already offered, we can provide you with our virtual box image.

Virtual Box is Universal Hardware ;)

* It is very important that you do not log in via root. The login must be done directly, not over su
 
Last edited:
Well, it's going to be a long story (and I can be mistaken in some details but I'll try).

I ran virtualbox image and tried to run dd first:
testuser@web100test:~$ time dd if=/dev/zero of=test3.img bs=1G count=10 oflag=dsync
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 34.4634 s, 312 MB/s

real 0m35.249s
user 0m0.001s
sys 0m23.125s​
The write speed is quite bigger than 5MB/s which is set for testuser.
But `systemctl cat user-10000.slice` shows that limit is actually set:
BlockIOReadBandwidth=/dev/sda 1048576
...
BlockIOWriteBandwidth=/dev/sda 1048576​
(Well I've already decreased limit to 1MB/s)

As I see BlockIOWriteBandwidth works as "shared" "soft" limit, i.e. it's not getting applied to a process when noone else performs write operations on disk (the same is for read).

Next, I logged in to panel, ran dd, and started to navigate pages. The "slowest" page is "Files" in subsription.
This one is actually not-best-optiomized page for io, beacause it queries every file state in a directory (but still it should not be such a slow).

Then I checked processes tree and found the following:
2093 root 20 0 4016 1672 1632 S 0.0 0.0 0:00.03 `- plesk bin extension --exec revisium-antivirus ra_executor.php
2100 psaadm 20 0 308M 39348 24152 S 0.0 0.6 0:00.26 | `- /usr/bin/sw-engine -c /opt/psa/admin/conf/php.ini /usr/local/psa/bin/extension --exec revisium-antivirus ra_executor.php
2101 psaadm 20 0 4628 772 704 S 0.0 0.0 0:00.00 | | `- sh -c '/opt/psa/admin/bin/php' -dauto_prepend_file=sdk.php '/opt/psa/admin/plib/modules/revisium-antivirus/scripts/ra
2102 psaadm 20 0 304M 39344 24096 S 0.0 0.6 0:01.42 | | `- /usr/bin/sw-engine -c /opt/psa/admin/conf/php.ini -dauto_prepend_file=sdk.php /opt/psa/admin/plib/modules/revisiu​
(as I remember it's a files antirirus).
So, I killed him and "Files" pages started to load much faster.

So,
1. I have bigger IO speed in virtual machine on my not-much-up2date desktop (about 300MB/s vs 5MB/s). So maybe one of the problems is related to slow hardware or host machine limits for virtual machines.
2. BlockIOReadBandwidth/BlockIOWriteBandwidth works as soft limits for me (maybe I'm wrong here because it's not what I expected whem to be).
3. "Files" loads slower than other pages, revisium-antivirus make things even more slower.

Do you have something else to ask/correct/decline etc?
 
@Alxndr.V ,

I always wrote, that the login must be done directly. Not over su user! Otherwise, the process is executed with the user-0.slice. (Or at least know cgroups that the limit should not apply.)

Look at the result (with su login, limit was set to w/r 1MB/s):
upload_2019-4-30_12-13-40.png

Look at the result (direct login, limit was set to w/r 1MB/s):
upload_2019-4-30_12-14-0.png

The CGroups developers are brilliant developers. But there is a problem here that nobody has ever noticed.

The test is not about the measurement times. The point is that, for example, the panel stops responding in time when the logged in user is performing DD. I like to record a video if the problem is so hard to understand.

Sorry for the following question, are you a Plesk CGroups developer?

Thanks!
 

Attachments

  • upload_2019-4-30_11-58-20.png
    upload_2019-4-30_11-58-20.png
    11.2 KB · Views: 2
Last edited:
> I always wrote, that the login must be done directly.
This is very important notice that dramatically changes everything!

> Sorry for the following question, are you a Plesk CGroups developer?
Not exactly, but very close. (I'm from that team but not the one who implemented it)

So, I was able to reproduce the problem and found the following:
- when "cgrouped" user performs dd and you try to navigate the panel, then the following sql query get stucked:
Code:
UPDATE `sessions` SET `modified` = '1557139609' WHERE (`sess_id` = '5c7d0c6778b6e99b2750be4494da4e47');
this one is related to admin sessions but actually any update query will hang. (select queries works fine).

- If you strace mysqld process you can find that mysqld performs write operation to file then performs "fsync" and it hangs until dd is complete.

- (Also the problem is not present when user is not limited by cgroups).

I reported the bug to Plesk (here is a public id: PPPM-10497), but I have no workaround or more detailed explanation at the moment.
 
@Alxndr.V ,

big thanks for this answer!

It is important that the problem could be found by you.

By the way, BlockIOWrite/ReadBandwidth are deprecated.
 
Last edited:
> By the way, BlockIOWrite/ReadBandwidth are deprecated.
Yes, I pointed it in the bug report. I guess we will make some improvements in CGroups support and fix this too :)
 
Back
Top