Disable statistics_collector on all domains

rackaid support · Mar 27, 2013

I need to find a way to disable /usr/local/psa/admin/bin/statistics_collector (or improve efficiency) on a server. The client does not need these reports or quota management. The utility currently runs for hours generating high disk IO. I suspect the problem is due to the large number of files (10+ million).

Is there anyway to just disable this utility or significantly speed up the processing?

Right now it is running for 12+ hours and we have to manually kill it.

OS: Red Hat Enterprise Linux Server release 6.4 (Santiago)
Plesk: 11.0.9 RedHat el6 110120608.16

System Quotas are off.

Andy B. · Jul 27, 2017

I have exactly the same problem. Have you found a solution?

IgorG · Jul 27, 2017

Have you seen this KB article High CPU load during statistics calculation ? Maybe it is your case? Please check.

Andy B. · Aug 5, 2017

Yes, I saw this one and all the others on google, but no solution yet. Only kill the process helps.
I am going on holiday now. Can I just disable the DailyMaintanance Cron for a few days or will this produce some other problems?

IgorG · Aug 6, 2017

Andy B. said:
Can I just disable the DailyMaintanance Cron for a few days or will this produce some other problems?

Bad idea. There are many other important tasks which should be executed by DailyMaintanance script.

Andy B. · Aug 6, 2017

Hmm, I tried to turn it off now for 2 days and so far no problems. Not yet. What will happen when I turn this off for 14 days and then turn it on again looking for a better solution? Do you think this will produce some bad issues?

IgorG · Aug 6, 2017

Just run command

# /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -l

and see how many tasks this script performs. Aren't you afraid to miss something important?

Andy B. · Aug 6, 2017

I already did this before to look for the problem, so I found out that it's only this one domain with this big image folder which runs for years... and never stops before the server crashes when I don't kill the process.
Since I don't know every single detail what all of the tasks do, I ask you (the pro!) if there is something which is vital to the server and can not be turned of for 2 weeks. I don't see a need in all the tasks related to statistics, perhaps one of the others?

IgorG · Aug 6, 2017

I think that differents system updates, cleaning caches, update keys, upgrade extensions, etc. - are serious reasons not to stop this script.

Andy B. · Aug 7, 2017

Ok, now you make me nervous in doing that

So I will give you some more informations, perhaps you can help.

Ubuntu 14.04.4 LTS‬
Plesk Version 12.5.30 Update #68

Most domains runninge PHP 5.3.29
The one where the problems occur is running PHP 7.0.15

When the statistics_col task is reaching this special domain it always falls into the D mode and this is what strace is outputting:

Code:

getcwd("/var/www/vhosts/123blabla.uk/httpdocs/media/image/93/ff/07", 4096) = 61
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 10
getdents(10, /* 3 entries */, 32768)    = 96
lstat("4004470616866_600x600.jpg", {st_mode=S_IFREG|0644, st_size=32008, ...}) = 0
getdents(10, /* 0 entries */, 32768)    = 0
close(10)                               = 0
chdir("..")                             = 0
lstat("92", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
chdir("92")                             = 0

getcwd("/var/www/vhosts/123blabla.uk/httpdocs/media/image/93/ff/92", 4096) = 61
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 10
getdents(10, /* 3 entries */, 32768)    = 96
lstat("4011980075335_600x600.jpg", {st_mode=S_IFREG|0644, st_size=43243, ...}) = 0
getdents(10, /* 0 entries */, 32768)    = 0
close(10)                               = 0
chdir("..")                             = 0
lstat("41", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
chdir("41")

This than is sometimes only running for minutes, sometimes for hours until the CPU usage is rising very very fast until nothing is working anymore and the only thing possible is to manually restart the server!

I tried to raise a lot of values concerning apache and database, turn the domain complete off in plesk,... nothing helps so far.
How to go on? What can I do?

IgorG · Aug 7, 2017

Try to temporary move directory media/image/ to /root, for example, then calculate statistics for this domain and return media/image/ back. Next time only this change will be calculated more easily.

Andy B. · Aug 7, 2017

Ok I will try it, but what do mean with "Next time only this change will be calculated more easily." ?
Is it only calculating changed files?

IgorG · Aug 7, 2017

Andy B. said:
Is it only calculating changed files?

Yes.

Andy B. · Aug 7, 2017

Ok, I moved the files to root and executed this:

Code:

# /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteStatistics

It was running until the end, so we are one step further.
I was running it 3 times to see if it's going faster and it was a little bit.

Next step: I should put the folder back and run it again right?

I want to add one more thing because it's perhaps also relevant.
It's a multi-shop system and there are some other domains and subdomains also working with the same file base. That's why for this domains perhaps also all this files are checked, right? Can

Andy B. · Aug 7, 2017

It's running now for over 15 minutes and the CPU usage is rising. I am a little bit afraid. Normal load average is around 1.00 and 2.something.
Can you see something unusual here. I think the SWAP is only here from some past tasks.

Andy B. · Aug 7, 2017

Hi Igor,
the process "statistics_coll" is still running, now around 3,5 hours.
It was producing some zombies (11) over time and right now somehow killing something concerning Apache.
My frontend was not working but the backend. After a apache reset the website is running again and the statistics_coll is still running.

Here the apache error.log perhaps it tells you something:

Code:

[Mon Aug 07 12:50:28.807566 2017] [fcgid:warn] [pid 29810:tid 140566432253824] mod_fcgid: process 18459 graceful kill fail, sending SIGKILL
[Mon Aug 07 12:50:28.807705 2017] [fcgid:warn] [pid 29810:tid 140566432253824] mod_fcgid: process 19242 graceful kill fail, sending SIGKILL
[Mon Aug 07 12:56:05.011612 2017] [fcgid:warn] [pid 29810:tid 140566432253824] mod_fcgid: process 25694 graceful kill fail, sending SIGKILL
[Mon Aug 07 12:56:12.489544 2017] [fcgid:warn] [pid 29810:tid 140566432253824] mod_fcgid: process 25694 graceful kill fail, sending SIGKILL
[Mon Aug 07 13:18:15.416467 2017] [fcgid:warn] [pid 29810:tid 140566432253824] mod_fcgid: process 21718 graceful kill fail, sending SIGKILL
[Mon Aug 07 13:22:55.097223 2017] [fcgid:warn] [pid 29810:tid 140566432253824] mod_fcgid: process 21766 graceful kill fail, sending SIGKILL
[Mon Aug 07 14:00:24.330629 2017] [fcgid:warn] [pid 29810:tid 140566432253824] mod_fcgid: process 14368 graceful kill fail, sending SIGKILL
[Mon Aug 07 14:31:29.376999 2017] [core:warn] [pid 15195:tid 140566432253824] AH00045: child process 28653 still did not exit, sending a SIGTERM
[Mon Aug 07 14:31:29.377041 2017] [core:warn] [pid 15195:tid 140566432253824] AH00045: child process 27611 still did not exit, sending a SIGTERM
[Mon Aug 07 14:31:31.379209 2017] [core:warn] [pid 15195:tid 140566432253824] AH00045: child process 28653 still did not exit, sending a SIGTERM
[Mon Aug 07 14:31:31.379241 2017] [core:warn] [pid 15195:tid 140566432253824] AH00045: child process 27611 still did not exit, sending a SIGTERM
[Mon Aug 07 14:31:33.381403 2017] [core:warn] [pid 15195:tid 140566432253824] AH00045: child process 28653 still did not exit, sending a SIGTERM
[Mon Aug 07 14:31:33.381433 2017] [core:warn] [pid 15195:tid 140566432253824] AH00045: child process 27611 still did not exit, sending a SIGTERM
[Mon Aug 07 14:31:35.383595 2017] [core:error] [pid 15195:tid 140566432253824] AH00046: child process 28653 still did not exit, sending a SIGKILL
[Mon Aug 07 14:31:35.383649 2017] [core:error] [pid 15195:tid 140566432253824] AH00046: child process 27611 still did not exit, sending a SIGKILL
[Mon Aug 07 14:31:36.385248 2017] [mpm_event:notice] [pid 15195:tid 140566432253824] AH00491: caught SIGTERM, shutting down
[Mon Aug 07 14:31:37.824071 2017] [ssl:warn] [pid 29387:tid 139993846568832] AH01909: RSA certificate configured for horde.webmail:443 does NOT include an ID which matches the server name
[Mon Aug 07 14:31:37.824213 2017] [ssl:warn] [pid 29387:tid 139993846568832] AH01909: RSA certificate configured for horde.webmail:443 does NOT include an ID which matches the server name
[Mon Aug 07 14:31:37.824350 2017] [ssl:warn] [pid 29387:tid 139993846568832] AH01909: RSA certificate configured for lists:443 does NOT include an ID which matches the server name
[Mon Aug 07 14:31:37.824484 2017] [ssl:warn] [pid 29387:tid 139993846568832] AH01909: RSA certificate configured for default-81_169_213_189:443 does NOT include an ID which matches the server name
[Mon Aug 07 14:31:37.824621 2017] [ssl:warn] [pid 29387:tid 139993846568832] AH01909: RSA certificate configured for default-81_169_172_177:443 does NOT include an ID which matches the server name
[Mon Aug 07 14:31:37.824962 2017] [ssl:warn] [pid 29387:tid 139993846568832] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Mon Aug 07 14:31:37.824973 2017] [suexec:notice] [pid 29387:tid 139993846568832] AH01232: suEXEC mechanism enabled (wrapper: /usr/lib/apache2/suexec)
[Mon Aug 07 14:31:37.839823 2017] [auth_digest:notice] [pid 29388:tid 139993846568832] AH01757: generating secret for digest authentication ...
[Mon Aug 07 14:31:37.839897 2017] [:notice] [pid 29388:tid 139993846568832] mod_bw : Memory Allocated 0 bytes (each conf takes 48 bytes)
[Mon Aug 07 14:31:37.839901 2017] [:notice] [pid 29388:tid 139993846568832] mod_bw : Version 0.92 - Initialized [0 Confs]
[Mon Aug 07 14:31:37.841004 2017] [:notice] [pid 29388:tid 139993846568832] mod_python: Creating 8 session mutexes based on 16 max processes and 25 max threads.
[Mon Aug 07 14:31:37.841010 2017] [:notice] [pid 29388:tid 139993846568832] mod_python: using mutex_directory /tmp
[Mon Aug 07 14:31:37.850699 2017] [ssl:warn] [pid 29388:tid 139993846568832] AH01909: RSA certificate configured for horde.webmail:443 does NOT include an ID which matches the server name
[Mon Aug 07 14:31:37.850831 2017] [ssl:warn] [pid 29388:tid 139993846568832] AH01909: RSA certificate configured for horde.webmail:443 does NOT include an ID which matches the server name
[Mon Aug 07 14:31:37.850963 2017] [ssl:warn] [pid 29388:tid 139993846568832] AH01909: RSA certificate configured for lists:443 does NOT include an ID which matches the server name
[Mon Aug 07 14:31:37.851097 2017] [ssl:warn] [pid 29388:tid 139993846568832] AH01909: RSA certificate configured for default-81_169_213_189:443 does NOT include an ID which matches the server name
[Mon Aug 07 14:31:37.851229 2017] [ssl:warn] [pid 29388:tid 139993846568832] AH01909: RSA certificate configured for default-81_169_172_177:443 does NOT include an ID which matches the server name
[Mon Aug 07 14:31:37.851551 2017] [ssl:warn] [pid 29388:tid 139993846568832] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Mon Aug 07 14:31:37.853835 2017] [mpm_event:notice] [pid 29388:tid 139993846568832] AH00489: Apache/2.4.7 (Ubuntu) mod_fcgid/2.3.9 mod_jk/1.2.37 mod_python/3.3.1 Python/2.7.6 OpenSSL/1.0.1f configured -- resuming normal operations
[Mon Aug 07 14:31:37.853851 2017] [core:notice] [pid 29388:tid 139993846568832] AH00094: Command line: '/usr/sbin/apache2'

I think it's this line killing something:

Code:

[Mon Aug 07 14:31:36.385248 2017] [mpm_event:notice] [pid 15195:tid 140566432253824] AH00491: caught SIGTERM, shutting down

IgorG · Aug 7, 2017

In my opinion, it looks like the lack of resources but I can only recommend creating a request to support team to do the in-depth investigation to find the reason and to fix it. Please create a ticket to support at Plesk Help Center

Andy B. · Aug 8, 2017

Even if it was crashing something it was runnning to the end yesterday.
I have started it again right now to see if it is faster and it was. Only 45 minutes compared to around 6 hours yesterday.

It looked like that it was going through this complete folder structure again even if the folders are empty.
Tthis takes so much more time than all the rest of stuff I have on the server. I have some other domains with big folders: cache files and even more pictures in less folders on the server, but they are done fast, so I think it's because of this big media folder structure of Shopware.
The bad thing is even if it would run now for the moment, from time to time I have to update a lot of this files and then the problem would occur again!

When "statistics_coll" is not running or killed there are enough resources and the server is stable. Since yesterday for 100 days!
A way to turn this not really neccessary, resource hungry task off would be a nice feature for future Plesk versions. Just to give users in such cases at least the possibility to solve the problem the bad way.

What happens when the "support team to do the in-depth investigation"? I ask because this is a live server.

IgorG · Aug 8, 2017

Andy B. said:
What happens when the "support team to do the in-depth investigation"? I ask because this is a live server.

Experienced Plesk supporters will check and fix the issue directly on your server if you provide root access according to https://cscontact.plesk.com/static/other/Plesk_Server_Permission_Policy.pdf

Disable statistics_collector on all domains

rackaid support

New Pleskian

Andy B.

New Pleskian

IgorG

Plesk addicted!

Andy B.

New Pleskian

IgorG

Plesk addicted!

Andy B.

New Pleskian

IgorG

Plesk addicted!

Andy B.

New Pleskian

IgorG

Plesk addicted!

Andy B.

New Pleskian

IgorG

Plesk addicted!

Andy B.

New Pleskian

IgorG

Plesk addicted!

Andy B.

New Pleskian

Andy B.

New Pleskian

Andy B.

New Pleskian

IgorG

Plesk addicted!

Andy B.

New Pleskian

IgorG

Plesk addicted!

Similar threads