• Please be aware: Kaspersky Anti-Virus has been deprecated
    With the upgrade to Plesk Obsidian 18.0.64, "Kaspersky Anti-Virus for Servers" will be automatically removed from the servers it is installed on. We recommend that you migrate to Sophos Anti-Virus for Servers.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Issue 100% CPU

tnickoloff

New Pleskian
Server operating system version
Ubuntu 22.04
Plesk version and microupdate number
18.0.58 Update #2
Hello, My VPS gets 100% CPU usage caused by mariadb and sw-engine. I tried this solution: but when I start sw-engine I got 100% CPU usage again. I tried this couple of times and every time I stopped sw-engine, there was same stuck processes.
Later after a couple of hours (may be 3 or more), the problem gone and CPU usage was back to normal. In the next day it happens again.
This was last week. I didn't found the problem, so yesterday I reinstalled the OS and PLESK(image provided by datacenter), deployed the site and everything was good till 2 hours ago. Again 100% CPU usage.
What can I do? Please help. I can provide logs or whatever you need.
plsk.png
 

Attachments

  • plsk.png
    plsk.png
    376.4 KB · Views: 0
In that case this is either caused by hanging sw-engine processes or sub processes or by a large number of incoming requests against the login page. Have you activated fail2ban to stop brute force attacks against :8443? Maybe you could also check the /var/log/sw-cp-server/error_log for additional information?
 
It is activated fail2ban. Here is part of /var/log/sw-cp-server/error_log:

2024/02/21 12:33:21 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:33:35 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:34:05 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:34:06 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:35:21 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:35:38 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:38:10 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:42:44 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 13:07:14 [error] 361155#0: *7079581 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079593 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079605 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079561 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079595 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079579 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079596 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079620 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079594 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079590 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079630 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079611 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079612 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079624 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079623 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
 
"Too many open files" look suspicious. Make sure to apply at least the advice given in https://support.plesk.com/hc/en-us/...-reload-on-a-Plesk-server-Too-many-open-files, but also check whether you need to add "fs.file-max = <put high number here>" into /etc/sysctl.conf and add hard and soft limits to /etc/security/limits.conf, too, for example:

Code:
nginx soft nofile <high number here>
nginx hard nofile <high number here>
root soft nofile <high number here>
root hard nofile <high number here>
psaadm soft nofile <high number here>
psaadm hard nofile <high number here>
mysql soft nofile <high number here>
mysql hard nofile <high number here>
httpd soft nofile <high number here>
httpd hard nofile <high number here>

with "<high number here>" a fairly high number of a number of open files allowed on your system. For example: 100000. The actual number of open files on your system can be determined by running lsof | wc -l. Doe not exceed 1 Mio (1000000) in your configuration files, because on some OS, a larger number could break SSH root access (su/sudo).
After making the changes to the files, run systemctl --system daemon-reload && sysctl -p, then restart the services, e.g. service sw-engine restart && service sw-cp-server.
 
Peter, thank you very much for your reply!
I did all what you wrote to me. I set 'Max open files' to 4096. Surprisingly for me the check before new setting showed two pretty different values:

Bash:
grep 'Max open files' /proc/$(cat /var/run/nginx.pid)/limits
Max open files            1024                 524288               files

I don't know is it normal or not.
The actual number of open files in this moment:
Bash:
lsof | wc -l
27637
But right now there is no domain pointing to this server. (After the failure yesterday I pointed to another server) CPU usage is 0% now.

I put the values you suggested in /etc/sysctl.conf and /etc/security/limits.conf files and restarted services.

This VPS serves only 1 site, 1 domain.

I appreciate your help. My question is: is there anything I should do before switch back the server to production? I really want to avoid "try catch" play.
 
As of the information provided earlier I am certain that the issue is caused by the "too many files" situation. But who knows whether other problems exist. You might consider setting
fs.inotify.max_user_watches = 560144
fs.inotify.max_user_instances = 1024
(or other suitable high values for your server), in /etc/sysctl.conf, too.
For your webserver(s) you could consider running
# /usr/local/psa/admin/sbin/websrv_ulimits --set 500000 --no-restart
or another high number that fits your situation, so that they also have a high "max files" limit upon each start attempt.
 
Back
Top