• Please be aware: Kaspersky Anti-Virus has been deprecated
    With the upgrade to Plesk Obsidian 18.0.64, "Kaspersky Anti-Virus for Servers" will be automatically removed from the servers it is installed on. We recommend that you migrate to Sophos Anti-Virus for Servers.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Question php82-fpm stops responding, how to tune, debug, check variables

AdrianC

Basic Pleskian
Hello.
I need help debugging this...
Using: Plesk 18.0.62, PHP 8.2.21 Running as FPM

My site has some millions of pages and gets hit by many bots, the database is over 100 GB and some queries are slow, so I think that is enough for some pages to respond slow and use all apache slots.

But the current bottleneck that seems to crash the site is PHP-FPM, when site stops responding and I restart plesk-php82-fpm, the site starts working.

Below is php-fpm status when the site is in crashed state, not responding to requests: notice the round "100" active processes and zero traffic:

Bash:
service plesk-php82-fpm status
Redirecting to /bin/systemctl status plesk-php82-fpm.service
● plesk-php82-fpm.service - The PHP 8.2.21 FastCGI Process Manager
   Loaded: loaded (/usr/lib/systemd/system/plesk-php82-fpm.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/plesk-php82-fpm.service.d
           └─limit_nofile.conf, respawn.conf
   Active: active (running) since Fri 2024-07-26 13:49:31 CEST; 1 day 20h ago
 Main PID: 1127 (php-fpm)
   Status: "Processes active: 100, idle: 0, Requests: 738153, slow: 0, Traffic: 0.00req/sec"
    Tasks: 601 (limit: 806516)
   Memory: 699.5M
   CGroup: /system.slice/plesk-php82-fpm.service
           ├─   1127 php-fpm: master process (/opt/plesk/php/8.2/etc/php-fpm.conf)
           ├─   2575 php-fpm: pool example3.com
           ├─   2587 php-fpm: pool example3.com
           ├─  37268 php-fpm: pool example3.com
           ├─  37290 php-fpm: pool example3.com
           ├─  37305 php-fpm: pool example3.com
           ├─  37315 php-fpm: pool example3.com
           ├─  52998 /usr/bin/host 240e:83:200::a3
           ├─  73731 /usr/bin/host 36.110.131.179
           ├─ 153442 /usr/bin/host 36.110.131.213
           ├─ 174469 php-fpm: pool example3.com
           ├─ 174780 php-fpm: pool example3.com
           ├─ 175765 /usr/bin/host 36.110.131.77
           ├─ 184413 /usr/bin/host 36.110.131.14
           ├─ 209269 /usr/bin/host 36.110.131.254
           ├─ 213632 php-fpm: pool example3.com
           ├─ 213845 php-fpm: pool example3.com
           ├─ 214348 php-fpm: pool example3.com
           ├─ 214754 php-fpm: pool example3.com
           ├─ 216250 /usr/bin/host 192.165.85.13
           ├─ 224022 /usr/bin/host 193.182.112.227
           ├─ 238294 /usr/bin/host 193.183.171.234
           ├─ 240003 /usr/bin/host 193.183.171.234
           ├─ 320308 /usr/bin/host 216.19.203.74
           ├─ 363796 /usr/bin/host 117.155.112.53
           ├─ 367790 /usr/bin/host 2406:7400:c6:d62f:7415:917c:ba9c:bc6a
           ├─ 378723 php-fpm: pool example3.com
           ├─ 378724 php-fpm: pool example3.com
           ├─ 392986 /usr/bin/host 2409:8900:15e1:a933:c3bb:8441:1830:5148
           ├─ 531141 /usr/bin/host 36.110.131.210
           ├─ 592299 php-fpm: pool example3.com
           ├─ 592325 php-fpm: pool example3.com
           ├─ 592609 /usr/bin/host 193.183.67.161
           ├─ 592840 /usr/bin/host 193.183.67.161
           ├─ 596044 php-fpm: pool example3.com
           ├─ 596384 php-fpm: pool example3.com
           ├─ 598818 /usr/bin/host 193.180.166.248
           ├─ 636127 /usr/bin/host 36.110.131.11
           ├─ 704965 /usr/bin/host 117.152.190.145
           ├─ 743168 /usr/bin/host 36.110.131.31
           ├─ 770203 /usr/bin/host 36.110.131.43

The processes /usr/bin/host shown there are because I am getting IP records of visitors hostname using below PHP exe code (in live site scripts, not by PHP CLI):
PHP:
exec("/usr/bin/host ".escapeshellarg($hostname), $output);

While site is crashed, the error log inside /var/www/vhosts/system/example3.com/logs/error_log:

Bash:
[Sun Jul 28 08:12:44.864122 2024] [proxy_fcgi:error] [pid 664910:tid 140031494579968] (70007)The timeout specified has expired: [client 23.22.35.162:0] AH01075: Error dispatching request to : (polling)
[Sun Jul 28 08:12:45.066368 2024] [proxy_fcgi:error] [pid 675968:tid 140030924269312] (70007)The timeout specified has expired: [client 2a03:2880:13ff:7::face:b00c:0] AH01075: Error dispatching request to : (polling)
[Sun Jul 28 08:12:45.693213 2024] [proxy_fcgi:error] [pid 3159640:tid 140031360493312] (70007)The timeout specified has expired: [client 2a01:111:f100:4002::9d37:c7e8:0] AH01075: Error dispatching request to : (polling)
[Sun Jul 28 08:12:46.500551 2024] [proxy_fcgi:error] [pid 677420:tid 140031469401856] (70007)The timeout specified has expired: [client 2a03:2880:13ff:1a::face:b00c:0] AH01075: Error dispatching request to : (polling)
[Sun Jul 28 08:12:46.682880 2024] [proxy_fcgi:error] [pid 3159640:tid 140031368886016] (70007)The timeout specified has expired: [client 2a03:2880:ff:20::face:b00c:0] AH01075: Error dispatching request to : (polling)
[Sun Jul 28 08:12:47.499567 2024] [proxy_fcgi:error] [pid 3159625:tid 140031469401856] (70007)The timeout specified has expired: [client 2603:1030:7:5::cb:0] AH01075: Error dispatching request to : (polling)
[Sun Jul 28 08:12:47.586568 2024] [proxy_fcgi:error] [pid 677420:tid 140031452616448] (70007)The timeout specified has expired: [client 2a03:2880:ff:f::face:b00c:0] AH01075: Error dispatching request to : (polling)

Inside domains > example3.com > PHP settings I had:

pm.max_children = 100; // changed to 200
pm.max_requests = [was blank] // changed to 500


I am unsure how can I stop it from crashing in future ...

- Can it be because max_requests was not filled ? What is a good value for this ?
- Because above "/usr/bin/host" processes are not labeled as "php-fpm: pool example3.com", does this mean they use php-fpm slots outside of pm.max_children that that I defined under the domain php settings ?
- How can I view more live stats and current running config of PHP-FPM ? I only see some stats in service plesk-php82-fpm status and inside apache mod-status page.

IMPORTANT: Now I realised that above php exec("/usr/bin/host ".$user_hostname); takes 15 seconds in case of a bad IP, so this alone is enough to slow down the page and use all slots. No ?

Above questions are still relevand, what happens when pm.max_requests was default inside domain php settings ?
Because I don't know how to check current live config for php-fpm, I don't know if it is unlimited or is set in a parent PHP config, and what value I should use.

Sorry for the long thread.
 
Also, where is the log file that should say something like php-fpm reached pm.max_children ? That should be important.

Edit: found it.

Code:
cat /var/log/plesk-php82-fpm/error.log 
[28-Jul-2024 05:56:18] WARNING: [pool example3.com] child 3159868 said into stderr: "Timeout."
[28-Jul-2024 06:03:00] WARNING: [pool example3.com] server reached max_children setting (100), consider raising it
[28-Jul-2024 06:59:01] WARNING: [pool example3.com] child 4109652 said into stderr: "Timeout."
[28-Jul-2024 07:00:02] WARNING: [pool example3.com] child 4142486 said into stderr: "Timeout."
[28-Jul-2024 10:56:58] WARNING: [pool example3.com] child 2177859 said into stderr: "Timeout."
[28-Jul-2024 10:57:27] WARNING: [pool example3.com] child 2189271 said into stderr: "Timeout."
[28-Jul-2024 10:57:58] WARNING: [pool example3.com] child 2227625 said into stderr: "Timeout."
 
Increasing max children will not solve the underlying issue. It can cover it up, but many children are not needed if the scripts of a website respond swiftly. The problem here is that long running tasks keep a child process waiting, hence it cannot process other incoming requests. The analysis pointing to slow name resolution seems an excellent point to me. Almost certainly the situation can be improved when this is done asynchronously. For example you could write the ip into a database and then have a cron job and command line script resolve the addresses in the background, so that the webserver PHP-FPM never needs to wait on it.
 
The slow "ip to hostname" command was probably it, I set a shorter timeout to that and I didn't have problems afterwards, e.g not reaching "max children" anymore.
But yes, adding these ip commands to a database and run it in background seems like a better practice. For now I need the results instantly, but it can be done.

I also found this usefull php function:

Code:
<?php print_r(fpm_get_status()); ?>

It prints usefull php-fpm stats like "max children reached"

Code:
    [pool] => examplesite.com
    [process-manager] => ondemand
    [start-time] => 1722156649
    [start-since] => 656852
    [accepted-conn] => 2556789
    [listen-queue] => 0
    [max-listen-queue] => 0
    [listen-queue-len] => 0
    [idle-processes] => 12
    [active-processes] => 13
    [total-processes] => 25
    [max-active-processes] => 38
    [max-children-reached] => 0
    [slow-requests] => 0
    [procs] => Array
    [...] some more data here ...
 
Back
Top