Resolved The dreaded 502 Bad Gateway, nginx + php-fpm

OlgaKM · Feb 9, 2017

I recently switched to nginx + php-fpm on my server. There are about 60 domains on the server, most low traffic, but some mid-to-high traffic. All seemed to be working fine for the first 12 hours, but then I started getting a 502 Bad Gateway error on one of the high traffic domains. The nginx error log is full of messages like:

Code:

2017/02/10 00:32:42 [error] 46165#0: *119100 connect() to unix:///var/www/vhosts/system/mysite.com/php-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 46.229.168.71, server: mysite.com, request: "GET /some-page.php HTTP/1.1", upstream: "fastcgi://unix:///var/www/vhosts/system/mysite.com/php-fpm.sock:", host: "www.mysite.com"

I started googling, and this seems to be quite a common problem, but nothing I have tried so far has helped. I am guessing, because only 1 domain is affected, and that is the highest traffic one, that it is an issue of php-fpm being overwhelmed. I create a file /var/www/vhosts/system/mysite.com/conf/php.ini where I tried playing around with the pool settings, gradually increasing it. Currently, I have the following settings, and I'm still getting the error:

Code:

[php-fpm-pool-settings]
; By default use ondemand spawning (this requires php-fpm >= 5.3.9)
pm = dynamic
pm.max_children = 1500
pm.process_idle_timeout = 10s
; Following pm.* options are used only when 'pm = dynamic'
pm.start_servers = 20
pm.min_spare_servers = 20
pm.max_spare_servers = 50

According to what I've read, the max_children value is certainly too high for my specs! However, I tried every value in between including 5, 40, 100, 300, 1000.

I have read that I should try switching from unix to tcp/ip socket, but I'm not sure of a safe way to do this on Plesk. The config file for the domain in /opt/plesk/php/7.0/etc/php-fpm.d/ says:

Code:

; Don't override following options, they are relied upon by Plesk internally

The socket is listed just below this.

In short, I am not sure what I can do next. Any advice?

Edit: It seems that the other domains on the server, which are lower traffic, have also started throwing this error. So perhaps it is not a traffic issue? Anyway, I read somewhere that the error message "pool seems busy" is posted to the error log when it is a load issue...

Edit 2: Ok, this is very strange. If I set php-fpm as the PHP handler for 1 service plan (which contains 5 domains, but these are by far the PHP-using domains with the highest traffic), everything seems to work fine. As soon as I start changing the settings for other service plans, I begin getting errors!

Bitpalast · Feb 10, 2017

Most frequently a 502 is caused by fail2ban blocking one of these addresses:
- 127.0.0.1
- your server's public IPv4 address
Add both of them to the fail2ban whitelist to avoid that issue.

OlgaKM · Feb 15, 2017

I finally figured it out. It was a multi-part problem (and nothing to do with fail2ban).

Firstly, when I updated the settings, sometimes the php-fpm service would hang and need to be restarted. This issue is described here:

https://support.plesk.com/hc/en-us/...m-show-502-Bad-Gateway-or-504-Gateway-Timeout

Note that if you run PHP 7.0, the correct command is:

Code:

service plesk-php70-fpm restart

A second issue was that I had a memory leak somewhere, and since I had tweaked php-fpm to be "dynamic" rather than on demand, this was building up and eventually causing the outage as the size of the PHP-FPM child processes grew. To prevent this, I added the setting:

Code:

pm.max_requests = 1000

To the relevant /var/www/vhosts/system/<vhost>/conf/php.ini

Resolved The dreaded 502 Bad Gateway, nginx + php-fpm

OlgaKM

Basic Pleskian

Bitpalast

Plesk addicted!

OlgaKM

Basic Pleskian

Similar threads