• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion
  • Inviting everyone to the UX test of a new security feature in the WP Toolkit
    For WordPress site owners, threats posed by hackers are ever-present. Because of this, we are developing a new security feature for the WP Toolkit. If the topic of WordPress website security is relevant to you, we would be grateful if you could share your experience and help us test the usability of this feature. We invite you to join us for a 1-hour online session via Google Meet. Select a convenient meeting time with our friendly UX staff here.

Issue 100% CPU

tnickoloff

New Pleskian
Server operating system version
Ubuntu 22.04
Plesk version and microupdate number
18.0.58 Update #2
Hello, My VPS gets 100% CPU usage caused by mariadb and sw-engine. I tried this solution: but when I start sw-engine I got 100% CPU usage again. I tried this couple of times and every time I stopped sw-engine, there was same stuck processes.
Later after a couple of hours (may be 3 or more), the problem gone and CPU usage was back to normal. In the next day it happens again.
This was last week. I didn't found the problem, so yesterday I reinstalled the OS and PLESK(image provided by datacenter), deployed the site and everything was good till 2 hours ago. Again 100% CPU usage.
What can I do? Please help. I can provide logs or whatever you need.
plsk.png
 

Attachments

  • plsk.png
    plsk.png
    376.4 KB · Views: 0
In that case this is either caused by hanging sw-engine processes or sub processes or by a large number of incoming requests against the login page. Have you activated fail2ban to stop brute force attacks against :8443? Maybe you could also check the /var/log/sw-cp-server/error_log for additional information?
 
It is activated fail2ban. Here is part of /var/log/sw-cp-server/error_log:

2024/02/21 12:33:21 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:33:35 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:34:05 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:34:06 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:35:21 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:35:38 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:38:10 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:42:44 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 13:07:14 [error] 361155#0: *7079581 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079593 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079605 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079561 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079595 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079579 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079596 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079620 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079594 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079590 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079630 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079611 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079612 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079624 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079623 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
 
"Too many open files" look suspicious. Make sure to apply at least the advice given in https://support.plesk.com/hc/en-us/...-reload-on-a-Plesk-server-Too-many-open-files, but also check whether you need to add "fs.file-max = <put high number here>" into /etc/sysctl.conf and add hard and soft limits to /etc/security/limits.conf, too, for example:

Code:
nginx soft nofile <high number here>
nginx hard nofile <high number here>
root soft nofile <high number here>
root hard nofile <high number here>
psaadm soft nofile <high number here>
psaadm hard nofile <high number here>
mysql soft nofile <high number here>
mysql hard nofile <high number here>
httpd soft nofile <high number here>
httpd hard nofile <high number here>

with "<high number here>" a fairly high number of a number of open files allowed on your system. For example: 100000. The actual number of open files on your system can be determined by running lsof | wc -l. Doe not exceed 1 Mio (1000000) in your configuration files, because on some OS, a larger number could break SSH root access (su/sudo).
After making the changes to the files, run systemctl --system daemon-reload && sysctl -p, then restart the services, e.g. service sw-engine restart && service sw-cp-server.
 
Peter, thank you very much for your reply!
I did all what you wrote to me. I set 'Max open files' to 4096. Surprisingly for me the check before new setting showed two pretty different values:

Bash:
grep 'Max open files' /proc/$(cat /var/run/nginx.pid)/limits
Max open files            1024                 524288               files

I don't know is it normal or not.
The actual number of open files in this moment:
Bash:
lsof | wc -l
27637
But right now there is no domain pointing to this server. (After the failure yesterday I pointed to another server) CPU usage is 0% now.

I put the values you suggested in /etc/sysctl.conf and /etc/security/limits.conf files and restarted services.

This VPS serves only 1 site, 1 domain.

I appreciate your help. My question is: is there anything I should do before switch back the server to production? I really want to avoid "try catch" play.
 
As of the information provided earlier I am certain that the issue is caused by the "too many files" situation. But who knows whether other problems exist. You might consider setting
fs.inotify.max_user_watches = 560144
fs.inotify.max_user_instances = 1024
(or other suitable high values for your server), in /etc/sysctl.conf, too.
For your webserver(s) you could consider running
# /usr/local/psa/admin/sbin/websrv_ulimits --set 500000 --no-restart
or another high number that fits your situation, so that they also have a high "max files" limit upon each start attempt.
 
Back
Top