• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Resolved nginx Fails, Bringing Site Down

Mark Bailey

Basic Pleskian
Hi,

We had a client's site go down, and investigation showed the issue to be with nginx. The log showed multiple entries like this:

3619#0: *22389 upstream timed out (110: Connection timed out) while SSL handshaking to upstream

To get the site back up while troubleshooting, we tried to disable nginx for that site, but apparently that's not possible. Apparently it can only be done server-wide, which we think is a shortcoming.

So...
  • Has anyone experienced anything like this before?
  • How can we resolve it?
  • How can we prevent it in the future?
BTW the site ran fine for months before this.

Thanks,

Mark
 
@Mark Bailey

The post of @IgorG points in the direction of a (potential) work-around, it will probably work though.

However, it is not a FULL solution to the root cause of the problem.

It can be safely stated that you have some issue at the (backend) Apache server, which takes a long time to serve one or more requests with a response.

It is pretty safe to assume that your Nginx proxy is not configured to wait that long: Nginx will close the connection before Apache is finished with the request processing.

In summary, you have

a) one or more requests that take a (very) long time to be processed fully by Apache, implying that

1 - the Apache config has to allow for a long period, in which requests can be processed fully
2 - the web application at the problematic domain is not coded in such a way that requests can be processed efficiently and fast

b) Nginx as a proxy, implying that you also have to tweak Nginx config to allow for a long period, in which Nginx waits for (full) request processing by Apache

In essence, the root cause of the problem is the (problematic) web application itself: any tweak of Apache or Nginx is just a work-around.

In conclusion, it is recommended that you (in chronological order of relevance!)

1 - tackle problem a.2: have a look at potential offending code in the web application and improve code, whenever possible,

2 - tackle problem a.1: tweak Apache config (if offending code cannot be found and/or cannot be changed) and it is recommended to use the following values for

max_execution_time: 300
max_input_time: 300

and please note that

- any higher value is not really a good idea: if your web application requires more than 300 seconds (i.e. 5 minutes), then something is horribly wrong,
- the value of 300 seconds (and 600 seconds at most) will nicely "fit into" the default Nginx config, as shipped with Plesk: higher values often require Nginx tweaking,

3 - tackle problem b: tweak Nginx to wait a bit longer for responses by changing

- timeout settings: this will be done automatically when changing max_execution_time settings (see notes below!)
- keepalive settings: Plesk applies by default the Nginx directive keepalive_timeout 65; and that is sufficient, but you can always increase it slightly (see notes below!)

and please note that

- the Nginx directives to work with are: proxy_connect_timeout, proxy_send_timeout, proxy_read_timeout, send_timeout and keepalive_timeout
- Plesk default values for proxy_connect_timeout, proxy_send_timeout, proxy_read_timeout and send_timeout are sufficient and set at 600 (seconds)
- Plesk default value for keepalive_timeout is 65 (seconds)
- Plesk values for max_execution_time (Apache setting) and proxy_read_timeout (Nginx setting) are identical at the domain level (and 600 seconds server-wide)
- Plesk values for max_execution_time (Apache setting) and proxy_read_timeout (Nginx setting) are an override of server-wide settings
- do not add custom Nginx settings at the domain level for proxy_send_timeout and send_timeout: the server-wide setting of 600 (seconds) is sufficient (and any increase of this value beyond 600 seconds will increase security risks and/or can cause severe overloads of the Apache server)
- any custom Nginx setting at the domain level for proxy_connect_timeout can help, but in most cases an increase beyond the value of 600 seconds will only add value in the scenario of debugging the web application: in production, there often is no need to change this value when using Apache + Nginx in a Plesk eco-environment

and finally note that it is not recommended to fiddle with Nginx timeout settings at all: just use default values, whenever possible!


All of the above is just an elaborate way to explain the whole story and to help you figure out what to do next.

Simply stated, just apply the solution proposed by @IgorG (and Nginx config will be automatically adjusted to some extent), but be aware of the fact that you have to find and solve the root cause of the problem, being a web application with code that is not able to process requests efficiently and fast.

Hope the above helps a bit.

Regards.............
 
Thank you. To further clarify, this is an AWS server with 16 GB RAM and the default Plesk configuration values, and only one WordPress site that gets minimal traffic. So this is one reason we were very concerned, it isn't any kind of overload situation, we don't know why light traffic still resulted in this. The WordPress site and plugins work fine and generally perform very quickly. (We've been developing in WP for a decade now, so the site followed best practices.)

So our concern is why a reasonably powerful server with only low traffic would still result in a situation like this. You're right, it indicates a problem in Apache or somewhere, so that makes us wonder if the Plesk defaults have an issue?

So any additional thoughts would be appreciated.

Thanks,

Mark
 
Back
Top