• Please be aware: Kaspersky Anti-Virus has been deprecated
    With the upgrade to Plesk Obsidian 18.0.64, "Kaspersky Anti-Virus for Servers" will be automatically removed from the servers it is installed on. We recommend that you migrate to Sophos Anti-Virus for Servers.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Issue Need help » Plesk + CentOS + WP = PHP-FPM CPU @ 100%

easyware

Basic Pleskian
Hi, I don't know if this is a legit request for the forum because probably there is no Plesk issue here, but I don't know what to do anymore, so last chance.. :rolleyes:

I have set up two identical websites using Wordpress+Woocommerce (one the "public" shop, the other one is for "resellers"), every now and then the CPU spikes @ 100% and the entire VPS hangs; slooow connection through ssh and a reboot is the only way to get back the server functionalities.

I recently managed to "see" this in action with top shell command open and it seems that the PHP-FPM process just goes off the roof when using the WP admin panel, sometimes this just causes a general slowing down, eventually hangs the entire VPS.

I tried changing the PHP version (now I am 7.2), checking plugins and upgrading the VPS CPU cores (now 4); it still causes the issue every now and then.

I am asking if someone has some insight about what to check, or eventually if I can just limit the CPU for that specific domain so at least the VPS and the other services stay on.

Thanks for every bit of help!
 
I am currently using Nginx on all the active domains.
Regarding the KB article, I confirm that Nginx is installed, active with the "proxy option" selected.
 
You might have an exposed endpoint or a script (such as the WordPress XML RPC file which is very popular with automated nefarious sniffing tools) that may be getting hit so regardless of handler, this behavior might occur over and over again. When the behavior occurs, check to see what files are being accessed at that time to get a better feel for what's going on.
 
Checkout the HTTP server status module (be sure to restrict it to your IP) but if you're looking just to get a general idea of what's going on:

See the top cpu load processes as they happen!
# watch "ps aux | sort -nrk 3,3 | head -n 20"

Find the process that the ID is attached to:
# ls -la /proc/PROCESSID/exe

Another way to find the processes that the ID is attached to:
# /usr/sbin/lsof -p PROCESSID | less
 
Thanks for the help, I tried tracing some of the processes that spikes up but they all refer "/opt/plesk/php/7.2/sbin/php-fpm".

But... one thing I found in the Apache log are these errors
Code:
2021-01-22 16:09:10    Error    128.116.221.126    503    GET / HTTP/1.0
2021-01-22 16:09:12    Error    128.116.221.126    AH01068: Got bogus version 100
2021-01-22 16:09:12    Error    128.116.221.126    (22)Invalid argument: AH01075: Error dispatching request to :

Should I follow this KB article?
 
EDIT - I switched the PHP Support as the KB Article says but this morning my alarm clock has been my customer's whatsapp messages about his website down.
Managed to reboot the VPS from the bed through SSH.

In the error log of the domain I just see 503 errors and then timeout errors.
 
To recover from the crashed service:
1. Stop Apache
# service httpd stop
or
# service apache2 stop
(depends on your OS)
2. Stop the PHP-FPM service
# service plesk-php<version>-fpm
3. Kill remaining php-fpm processes of that version
Identify the processes of that version that are still running:
# ps aux | grep php-fpm
Kill them
# kill -9 <process id>
4. Start Apache
# service httpd start
5. Start PHP-FPM service
# service plesk-php<version>-fpm start

To mitigate the issue for the future:
- In the PHP-FPM settings (PHP icon) of the domain increase pm.max_children to a much higher value, e.g 25
and
- In the PHP-FPM settings (PHP icon) of the domain increase pm.max_requests to a very high value, e.g. 10000
- In the PHP-FPM config file of the PHP version that you use (normally /opt/plesk/php/<version>/etc/php-fpm.conf), set these three parameters to these values or similar values:
emergency_restart_threshold = 3
emergency_restart_interval = 1m
process_control_timeout = 8s

To find the cause:
- Analyze the .htaccess file of the domain and make sure that no rewrite rules mentioned therein can end up in infinite loops. For example a rewrite might send a request to an address that has a rewrite on it that has a rewrite on it that has a rewrite on it. This can frequently be the case if rewrites point to resources that are missing, and these deliver a 404 not found while that again is caught by another rewrite to display a page in which a resource is included that is missing, that delivers a 404 not found ...
- Check your /logs/access_log for suspicious frequent requests to the same or similar resources and become aware of what the software is actually doing at that point. Sometimes very simple things like an AJAX editor that sends a new request every second can overload the number of PHP processes.
 
Thank you Peter for the help, I will apply the parameters you suggest to mitigate the issue and start analysis to find the cause (could be difficult in the Wordpress mess, I will let you know). Thanks again to everyone, I appreciate the help.
 
Are there any particular advantages choosing one PHP version or another?
I have 7.2.34, 7.3.26, 7.4.14 and 8.0.1 available..
 
For Wordpress and Woocommerce it is currently best to choose PHP 7.4.x. PHP 8 is too new and many software does not yet run with this version, the older versions are slower or just that: old.
 
Last edited:
To recover from the crashed service:
1. Stop Apache
# service httpd stop
or
# service apache2 stop
(depends on your OS)
2. Stop the PHP-FPM service
# service plesk-php<version>-fpm
3. Kill remaining php-fpm processes of that version
Identify the processes of that version that are still running:
# ps aux | grep php-fpm
Kill them
# kill -9 <process id>
4. Start Apache
# service httpd start
5. Start PHP-FPM service
# service plesk-php<version>-fpm start

To mitigate the issue for the future:
- In the PHP-FPM settings (PHP icon) of the domain increase pm.max_children to a much higher value, e.g 25
and
- In the PHP-FPM settings (PHP icon) of the domain increase pm.max_requests to a very high value, e.g. 10000
- In the PHP-FPM config file of the PHP version that you use (normally /opt/plesk/php/<version>/etc/php-fpm.conf), set these three parameters to these values or similar values:
emergency_restart_threshold = 3
emergency_restart_interval = 1m
process_control_timeout = 8s

To find the cause:
- Analyze the .htaccess file of the domain and make sure that no rewrite rules mentioned therein can end up in infinite loops. For example a rewrite might send a request to an address that has a rewrite on it that has a rewrite on it that has a rewrite on it. This can frequently be the case if rewrites point to resources that are missing, and these deliver a 404 not found while that again is caught by another rewrite to display a page in which a resource is included that is missing, that delivers a 404 not found ...
- Check your /logs/access_log for suspicious frequent requests to the same or similar resources and become aware of what the software is actually doing at that point. Sometimes very simple things like an AJAX editor that sends a new request every second can overload the number of PHP processes.
I'm facing the same issue. getting error log filled with below errors.
Code:
AH01068: Got bogus version 100
(22)Invalid argument: AH01075: Error dispatching request to :

As I searched for the solution I found this error is being produced due to the apache version. So will this solution of your's be helpful for the issue??

Server details
Centos 7
Ram - 256 GB
CPU Cores - 40
PHP V - 7.4 using PHP-FPM
pm.max_children = 100
pm.max_requests=500
 
This issue is been arised after 18.0.33,34 update, I tried aetting all php handlers 7.3, 7.4, 8.0.3 still as i start server within seconds cpu reaches 100% consumed state. Please help been facing this issue from 1 month to be exact
 
@omkarmore: Please check your access_log file and your error_log file to see what requests are being handled. From there you can normally easily figure out the reason why a lot of cpu power is needed.
 
@Peter Debik
My php config is php version 7.4.16 (i.e chgd from 8.0.3 after recommendation)
Php Fpm app served by Apache
opcache.enable = on
pm.max_children = 10
pm = "ondemand"
In error log file I am mostly getting
(Connection reset by peer. AH01075 Error dispatching request to ) and
(AH01067 Failed to read FastCGI header)
Among my process list, it seems php fpm is the culprit, there are multiple php-fpm processes that consume total cpu usage.
And Now getting few Modsecurity access denied errors.
I have to either restart server or put it under maintenance mode to release resources

Resolutions I tried:
Tried increasing hardware configuration i.e server plan still issue arises.
Increasing innodb_log_file_size = 64M for error (AH01071 data inserted in one transaction is greater than 10% of redo log size. Increase the redo log size using innodb_log_file_size)
Increased/decreased max_execution_time, max_input_time, post_max_size
Enabling/disabling opcache
changing php versions 7.3, 7.4, 8.0.3
Adding/Removing additional apache directives like:
FcgidIdleTimeout 1200
FcgidProcessLifeTime 1200
FcgidConnectTimeout 1200
FcgidIOTimeout 1200
Timeout 1200
ProxyTimeout 1200
 
Last edited:
I can confirm the problem with high CPU usage and a related sudden increase in MySQL usage.
MySQL also apparently no longer adheres to limitations and reserves or uses memory far beyond the default settings.

It is strange that the error has only appeared since we upgraded from Ubuntu 16.04.05 LTS (Obsidian 18.0.34) to Ubuntu 18.04.05 LTS.

At first everything went fine, the next morning 10 GB swap was filled - as I said, without any changes to the settings.
 
I was checking at LTUser response, talking about swap issue and I checked "top", the server keeps having problems with stopped processes (watchdog keeps notifying me) and now I have kswap0 process constantly at 100% CPU.

I have just updated the kernel this morning (CentOS), could be that the issue?
 

Attachments

  • Schermata 2021-06-12 alle 19.50.49.png
    Schermata 2021-06-12 alle 19.50.49.png
    97.8 KB · Views: 24
Did anyone find a definitive cause/fix for this?
We have numerous new CentOS7 servers that run the latest Plesk. The previous servers had CentOS6 and this issue -never- happened. Now on CentOS7 all servers with WP sites have this issue. They'll randomly spike the server load from an average of 3-5 up to 40-80 within seconds and it basically freezes up everything on the server for 5 minutes or so until it calms down. Even SSH is super slow when this happens so there's nothing we can do to stop/restart services or monitor anything. It's like the server has a royal panic attack and then has to chill. Then it's back to normal for 8-10 hours until it happens again.

We've done troubleshooting for MONTHS - tried every swappiness setting, database config, PHP config, Apache/Nginx setting we can think of. We've researched for countless hours. We've tried settings that are higher/looser than recommended and lower/tighter. The situation is overall a ways better than when the servers were new - they were spiking repeatedly all through the day and now it's only generally 2-3 times per day. But absolutely nothing ever jumped out as being a firm problem or fix, or even that we could tell helped 100%.
 
Back
Top