• Our team is looking to connect with folks who use email services provided by Plesk, or a premium service. If you'd like to be part of the discovery process and share your experiences, we invite you to complete this short screening survey. If your responses match the persona we are looking for, you'll receive a link to schedule a call at your convenience. We look forward to hearing from you!
  • We are looking for U.S.-based freelancer or agency working with SEO or WordPress for a quick 30-min interviews to gather feedback on XOVI, a successful German SEO tool we’re looking to launch in the U.S.
    If you qualify and participate, you’ll receive a $30 Amazon gift card as a thank-you. Please apply here. Thanks for helping shape a better SEO product for agencies!
  • The BIND DNS server has already been deprecated and removed from Plesk for Windows.
    If a Plesk for Windows server is still using BIND, the upgrade to Plesk Obsidian 18.0.70 will be unavailable until the administrator switches the DNS server to Microsoft DNS. We strongly recommend transitioning to Microsoft DNS within the next 6 weeks, before the Plesk 18.0.70 release.
  • The Horde component is removed from Plesk Installer. We recommend switching to another webmail software supported in Plesk.

Issue 100% CPU

tnickoloff

New Pleskian
Server operating system version
Ubuntu 22.04
Plesk version and microupdate number
18.0.58 Update #2
Hello, My VPS gets 100% CPU usage caused by mariadb and sw-engine. I tried this solution: but when I start sw-engine I got 100% CPU usage again. I tried this couple of times and every time I stopped sw-engine, there was same stuck processes.
Later after a couple of hours (may be 3 or more), the problem gone and CPU usage was back to normal. In the next day it happens again.
This was last week. I didn't found the problem, so yesterday I reinstalled the OS and PLESK(image provided by datacenter), deployed the site and everything was good till 2 hours ago. Again 100% CPU usage.
What can I do? Please help. I can provide logs or whatever you need.
plsk.png
 

Attachments

  • plsk.png
    plsk.png
    376.4 KB · Views: 1
In that case this is either caused by hanging sw-engine processes or sub processes or by a large number of incoming requests against the login page. Have you activated fail2ban to stop brute force attacks against :8443? Maybe you could also check the /var/log/sw-cp-server/error_log for additional information?
 
It is activated fail2ban. Here is part of /var/log/sw-cp-server/error_log:

2024/02/21 12:33:21 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:33:35 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:34:05 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:34:06 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:35:21 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:35:38 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:38:10 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 12:42:44 [crit] 85549#0: accept4() failed (24: Too many open files)
2024/02/21 13:07:14 [error] 361155#0: *7079581 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079593 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079605 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079561 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079595 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079579 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079596 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079620 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079594 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079590 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079630 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079611 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079612 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079624 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
2024/02/21 13:07:14 [error] 361155#0: *7079623 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/sw-engine.sock:", host: "MY_IP"
 
"Too many open files" look suspicious. Make sure to apply at least the advice given in https://support.plesk.com/hc/en-us/...-reload-on-a-Plesk-server-Too-many-open-files, but also check whether you need to add "fs.file-max = <put high number here>" into /etc/sysctl.conf and add hard and soft limits to /etc/security/limits.conf, too, for example:

Code:
nginx soft nofile <high number here>
nginx hard nofile <high number here>
root soft nofile <high number here>
root hard nofile <high number here>
psaadm soft nofile <high number here>
psaadm hard nofile <high number here>
mysql soft nofile <high number here>
mysql hard nofile <high number here>
httpd soft nofile <high number here>
httpd hard nofile <high number here>

with "<high number here>" a fairly high number of a number of open files allowed on your system. For example: 100000. The actual number of open files on your system can be determined by running lsof | wc -l. Doe not exceed 1 Mio (1000000) in your configuration files, because on some OS, a larger number could break SSH root access (su/sudo).
After making the changes to the files, run systemctl --system daemon-reload && sysctl -p, then restart the services, e.g. service sw-engine restart && service sw-cp-server.
 
Peter, thank you very much for your reply!
I did all what you wrote to me. I set 'Max open files' to 4096. Surprisingly for me the check before new setting showed two pretty different values:

Bash:
grep 'Max open files' /proc/$(cat /var/run/nginx.pid)/limits
Max open files            1024                 524288               files

I don't know is it normal or not.
The actual number of open files in this moment:
Bash:
lsof | wc -l
27637
But right now there is no domain pointing to this server. (After the failure yesterday I pointed to another server) CPU usage is 0% now.

I put the values you suggested in /etc/sysctl.conf and /etc/security/limits.conf files and restarted services.

This VPS serves only 1 site, 1 domain.

I appreciate your help. My question is: is there anything I should do before switch back the server to production? I really want to avoid "try catch" play.
 
As of the information provided earlier I am certain that the issue is caused by the "too many files" situation. But who knows whether other problems exist. You might consider setting
fs.inotify.max_user_watches = 560144
fs.inotify.max_user_instances = 1024
(or other suitable high values for your server), in /etc/sysctl.conf, too.
For your webserver(s) you could consider running
# /usr/local/psa/admin/sbin/websrv_ulimits --set 500000 --no-restart
or another high number that fits your situation, so that they also have a high "max files" limit upon each start attempt.
 
Back
Top