Resolved upstream timed out (110: Connection timed out) randomly

Mike1050 · Nov 30, 2018

Hello,

From 22/11/2018 . (server work from 9 month without this problem).

From plesk log acion (no updates are maked around this date...)

I have randomly during day, when my servers are used sometime this message:

Code:

26658#0: *285131 upstream timed out (110: Connection timed out) while reading response header from upstream

26658#0: *285846 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream

24540#0: *302 peer closed connection in SSL handshake (104: Connection reset by peer) while SSL handshaking to upstream

When is produce (sometimes go multiple 110 error) ,in this case the domain in plesk will be unreachable (website freeze, and after some times have 504 or 502 ngix gateway error).
During this time other domain in same server (are availables).
Sometime i have only one or 2 110 error and the website continue to work.

The problem occur randomly on 2 servers (are the same code) load balanced.
For the moment i focused on first server to debug problem (tested to connected direcly to this)

I have followed instructions from plesk:

Code:

/etc/nginx/nginx.conf

    proxy_connect_timeout 1200s;
    proxy_send_timeout 1200s;
    proxy_read_timeout 1200s;
    fastcgi_send_timeout 1200s;
    fastcgi_read_timeout 1200s;

and php.ini:
[php-fpm-pool-settings]
pm.max_children = 950

tried with php fpm managed by ngix, apache and fastcgi pshp v 7.1.24

My server have 30GB free mem (used max @ 10%)
Cpu used @ max 10%
Version 17.0.17 Update #60
Plesk Onyx CentOS Linux 7.4.1708
Licence manager by OVH (already contacted them but must wait somes hours before responses)

I have placed apache in trace6 debug level.
But complex to find where are the problem.
Maybe in php code? infinite loop? or server side problem? how i can trace this problem because the log are not much explain problem.

Thanks much for your help

garcue · Nov 30, 2018

Mike1050 said:
Hello,

From 22/11/2018 . (server work from 9 month without this problem).

From plesk log acion (no updates are maked around this date...)

I have randomly during day, when my servers are used sometime this message:

Code:

26658#0: *285131 upstream timed out (110: Connection timed out) while reading response header from upstream 26658#0: *285846 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream 24540#0: *302 peer closed connection in SSL handshake (104: Connection reset by peer) while SSL handshaking to upstream

When is produce (sometimes go multiple 110 error) ,in this case the domain in plesk will be unreachable (website freeze, and after some times have 504 or 502 ngix gateway error).
During this time other domain in same server (are availables).
Sometime i have only one or 2 110 error and the website continue to work.

The problem occur randomly on 2 servers (are the same code) load balanced.
For the moment i focused on first server to debug problem (tested to connected direcly to this)

I have followed instructions from plesk:

Code:

/etc/nginx/nginx.conf proxy_connect_timeout 1200s; proxy_send_timeout 1200s; proxy_read_timeout 1200s; fastcgi_send_timeout 1200s; fastcgi_read_timeout 1200s; and php.ini: [php-fpm-pool-settings] pm.max_children = 950

tried with php fpm managed by ngix, apache and fastcgi pshp v 7.1.24

My server have 30GB free mem (used max @ 10%)
Cpu used @ max 10%
Version 17.0.17 Update #60
Plesk Onyx CentOS Linux 7.4.1708
Licence manager by OVH (already contacted them but must wait somes hours before responses)

I have placed apache in trace6 debug level.
But complex to find where are the problem.
Maybe in php code? infinite loop? or server side problem? how i can trace this problem because the log are not much explain problem.

Thanks much for your help

Hello Mike1050,

The error that appears is produced because nginx is not able to receive an answer from a lower layer, in your case apache. Try to optimize the apache server configuration and you will see that you solve the error obtained.

You can also consider stop using Apache and serve everything directly from nginx.

I recommend that you add the following configuration to the file 'vhost_nginx.conf'

#headers sesion time
proxy_connect_timeout 900;
proxy_send_timeout 900;
fastcgi_send_timeout 900;
fastcgi_read_timeout 900;

trialotto · Nov 30, 2018

@Mike1050

Your issue is related to a FastCGI problem, so there is no reason to change Nginx conf and/or php.ini.

Please

a) revert the php.ini file to default values: the pm.max_children = 950 is a really bad choice - whatever you are trying to achieve, this is not a solution

b) remove

proxy_connect_timeout 1200s;
proxy_send_timeout 1200s;
proxy_read_timeout 1200s;
fastcgi_send_timeout 1200s;
fastcgi_read_timeout 1200s;

from the nginx.conf file - they should not be in that file, for many reasons (one of them is that the nginx.conf can be overwritten at upgrades)

c) check whether the file /etc/nginx/conf.d/timeout.conf exists (if not, create it) and make the contents of that file

proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;

and reload Nginx configuration with the command: service nginx reload (or equivalently, run the command: nginx -s reload)

d) verify that the files /etc/nginx/fastcgi.conf and /etc/nginx/fastcgi_params contain the line

fastcgi_param SCRIPT_NAME $fastcgi_script_name;

as this is related to one of the error messages in the log output that you provided.

If the problem still persists, just do the following

1 - try to find the offending and problematic domains

2 - open Plesk Panel and go to "Domains > [ select offending domain ] > PHP Settings (click) > Select: FPM Application served by Apache"

3 - check whether the problem persists on the domain that has been changed in step 2

4 - follow steps 1 to 2 for each offending domain, if (and only if) the check in step 3 indicates that your problems are not reoccurring

There are other ways to achieve the same, but the steps 1 to 4 form a procedure that is fairly easy and provides some control while trying to solve the issue: this kind of control is really recommended, because it will give you more insight into the exact root cause of the problem and/or on what domain(s) this root cause is present.

Anyway, hope the above helps a bit...........but keep us posted!

Regards.........

PS To explain a bit: steps 1 to 4 are forcing your Plesk instance to restart and/or to reset the (relevant) services, while still maintaining a proper config of Apache + Nginx (read: the proper config is initiated with the switch to Nginx with the FPM application served by Apache). If any issue still occurs after this switch, than a very serious issue exists due to some (severe) error in the config files, which error can be a typo, a wrong usage of stanza .........and the error can be virtually anywhere. So, it would be fine to first start with switching to something that should be working out of the box, this in order to exclude a number of explanations for the root cause of your problem.

Mike1050 · Nov 30, 2018

Hello Garcue and Trialotto, thanks for your responses.
So i will roll back added parameters to begin (so no other parameters has beeen changed on twice server) so i think finaly is not server config fault, but php code fault.
We are maybe identified the php page of the problem (with php/ajax script which not receive a response ) cause this page is called all time just before we have the upstream......
I'm in contact with the developpers about that.

Thanks to said for nginx.conf (i notice that mst use satelittes files to avoid to be erased custom config by updates).
I note for pm.max_children = 950 (cause was around 200) and thinked was for 4GB RAM machine parameter so with 32GB i thinked must be increased but i will remove it for next tests steps.

Thanks very much for your help , i take you informed about the solution

Mike1050 · Dec 2, 2018

Hello,

It's php code side problem,

Thanks for your advices, so the problem come from curl functions are called in a script.

Strange cause curl timeout is 250ms .... (so normaly must be continue php code after this time) but not sometimes here.

Best regards.

trialotto · Dec 2, 2018

@Mike1050

Often cURL should be configured/coded to close down properly, if the connection times out.........try that option, it will probably solve a lot of future issues.

Regards......

Jan Bludau · Mar 17, 2021

Mike1050 said:
Hello Garcue and Trialotto, thanks for your responses.
So i will roll back added parameters to begin (so no other parameters has beeen changed on twice server) so i think finaly is not server config fault, but php code fault.
We are maybe identified the php page of the problem (with php/ajax script which not receive a response ) cause this page is called all time just before we have the upstream......
I'm in contact with the developpers about that.

Thanks to said for nginx.conf (i notice that mst use satelittes files to avoid to be erased custom config by updates).
I note for pm.max_children = 950 (cause was around 200) and thinked was for 4GB RAM machine parameter so with 32GB i thinked must be increased but i will remove it for next tests steps.

Thanks very much for your help , i take you informed about the solution

also matomo reports http status code 499.

to rise up the php-fpm settings should solve the problem.

rayhan · Apr 22, 2021

Hello. Good afternoon. I have a similar problem. I don't have much knowledge of plesk. Can someone please help me?
I have a cloud server with 32 GB RAM, OS ubuntu 20.04,
ERROR: 1089#0: *26355 upstream timed out (110: Connection timed out) while SSL handshaking to upstream

Resolved upstream timed out (110: Connection timed out) randomly

Mike1050

New Pleskian

garcue

Regular Pleskian

trialotto

Golden Pleskian

Mike1050

New Pleskian

Mike1050

New Pleskian

trialotto

Golden Pleskian

Jan Bludau

Basic Pleskian

rayhan

New Pleskian

Attachments

Similar threads