Issue Websites Hanging loading for minutes and 502

stas styler · Jul 17, 2018

Hi guys,

Lately I'm seeing a problem that makes me so frustrated.
Some websites suddenly could not be loaded. When I type the URL and press enter it just trying to load them for ages and then show me 502 bad gateway.

I've tried to find the problem / reproduce it but with no luck.
The first quick fix was to restart the server. But then I found that if I go to PHP settings and change PHP version or just clicking OK, the website is coming back to life till the next time (It happens once a week / once every couple of days).

Restarting Nginx or apache doesn't help.

Background about our infrastructure:
We provide shared web hosting to our clients.
We got about 300 Websites (Pipe logs checked).
Server is dedicated, 128GB Ram, AMD EPYC 7401P 24-Core Processor (48 core(s)), Raid10 7TB SSD, Mariadb 11, Reversed Proxy nginx + Apache, Plesk Onyx 17.8, patch 14, centos 7.

htop doesn't show any important data, we barely use 20-30% of our cores & memory.
Logs show this:

Code:

(70007)The timeout specified has expired: AH01075: Error dispatching request to :, referer: https://homediet.co.il/nutritionists/317202106/

Code:

55818#0: *918084 upstream prematurely closed connection while reading response header from upstream

Code:

63988#0: *921196 connect() failed (111: Connection refused) while connecting to upstream

Many of them show PHP-Socket problem:

Code:

(2)No such file or directory: AH02454: FCGI: attempt to connect to Unix domain socket /var/www/vhosts/system/example.com/php-fpm.sock (*) failed
[proxy_fcgi:error] [pid 3838:tid 140126940247808] [client 203.0.113.2:56904] AH01079: failed to make connection to backend: httpd-UDS

I tried every suggestion \ solution in the threads, unfortunately it didn't help...
Anyone know what is going on? I think it started happening shortly after moving up to 17.8.

Bonsai78 · Jul 18, 2018

What says /var/log/php-fpm/error.log

You need php-frm? If not, you could try Fast-Cgi

Is nginx running? If yes, what happens if you swith the service off?

stas styler · Jul 18, 2018

Bonsai78 said:
What says /var/log/php-fpm/error.log

You need php-frm? If not, you could try Fast-Cgi

Is nginx running? If yes, what happens if you swith the service off?

Nginx is serving statis content while apache is running besides it on the same server (as configured as stock installation).

Yes, php-fpm has a lot faster response time because it spawns children that are waiting for tasks to come.

Logs for php-fpm show only notice errors about not importent stuff from some clients websites.

Bonsai78 · Jul 18, 2018

Probably a similar problem like this?
Variable not replaced - AH01079: failed to make connection to backend: httpd-UDS - i-MSCP - internet - Multi Server Control Panel

It at least should point you to the direction of the problem

stas styler · Jul 18, 2018

Bonsai78 said:
Probably a similar problem like this?
Variable not replaced - AH01079: failed to make connection to backend: httpd-UDS - i-MSCP - internet - Multi Server Control Panel

It at least should point you to the direction of the problem

It seems that they talk about the same error, but with different panel.
Can anyone confirm that it is a bug in plesk \ apache \ NGINX that is going to be fixed somehow?
If so, I need a manual fix or even a clue that it is going to be fixed.

My client's websites look as if they got down... that just not professional.

stas styler · Jul 22, 2018

UP.

I need your help. It happens again.
Can some one of the plesk team comment here?

Bitpalast · Jul 22, 2018

The symptoms do not describe a single, easily identifieable cause. My suggestion is to first check into the /var/log/plesk-phpXX-fpm/error.log log files. Maybe you find some more specific hints there. One frequent cause of the symptoms is, that the "max_children" limit is reached, thus further connections are dropped by the service. Other similar factors can play a role, too.

Frostbolt · Sep 12, 2018

I guess I'll drop our problem in this thread since it seems very similar. We run a fairly high traffic website. The past months we've used the option "FastCGI application served by Apache" to run php 7.1 (since we've had issues with FPM in the past)
Yesterday we switched to FPM again because we guessed the issues would have been resolved by now. Unfortunately the website became unresponsive this morning. After a few minutes I rebooted nginx which resolved the problem. The logs show:

Code:

proxy_error_log
2018/09/12 10:35:46 [error] 20215#0: *18681969 upstream timed out (110: Connection timed out) while reading response header from upstream, client:

Code:

error_log
[Wed Sep 12 10:35:11.783708 2018] [proxy_fcgi:error] [pid 15655:tid 139972506019584] (70007)The timeout specified has expired: [client ] AH01075: Error dispatching request to :, referer:

I've attached screenshots of the Apache Server Status during:

and after the issue:

We've switched back to FastCGI again now to prevent this issue.

Bitpalast · Sep 12, 2018

Make sure that your PHP-FPM settings are high enough, like in /var/log/vhosts/system/<domain>/conf:

Code:

[php-fpm-pool-settings]
pm.max_children = 50
pm.max_requests = 10000
pm.process_idle_timeout = 120s

These can be set in the additional PHP-FPM directives in the GUI, too (with the [php-fpm-pool-settings] headline of course).
Also make sure that the scripts that are running can actually finish what they are doing. Sometimes scripts end up in infinite loops that can cause similar issues as described.

Issue Websites Hanging loading for minutes and 502

stas styler

Basic Pleskian

Bonsai78

Basic Pleskian

stas styler

Basic Pleskian

Bonsai78

Basic Pleskian

stas styler

Basic Pleskian

stas styler

Basic Pleskian

Bitpalast

Plesk addicted!

Frostbolt

Basic Pleskian

Bitpalast

Plesk addicted!

Similar threads