Resolved nginx Fails, Bringing Site Down

Mark Bailey · Dec 14, 2018

Hi,

We had a client's site go down, and investigation showed the issue to be with nginx. The log showed multiple entries like this:

3619#0: *22389 upstream timed out (110: Connection timed out) while SSL handshaking to upstream

To get the site back up while troubleshooting, we tried to disable nginx for that site, but apparently that's not possible. Apparently it can only be done server-wide, which we think is a shortcoming.

So...

Has anyone experienced anything like this before?
How can we resolve it?
How can we prevent it in the future?

BTW the site ran fine for months before this.

Thanks,

Mark

IgorG · Dec 16, 2018

Mark Bailey said:
upstream timed out (110: Connection timed out) while SSL handshaking to upstream

Try to apply solution from Website is inaccessible: 504 Gateway Timeout

trialotto · Dec 17, 2018

@Mark Bailey

The post of @IgorG points in the direction of a (potential) work-around, it will probably work though.

However, it is not a FULL solution to the root cause of the problem.

It can be safely stated that you have some issue at the (backend) Apache server, which takes a long time to serve one or more requests with a response.

It is pretty safe to assume that your Nginx proxy is not configured to wait that long: Nginx will close the connection before Apache is finished with the request processing.

In summary, you have

a) one or more requests that take a (very) long time to be processed fully by Apache, implying that

1 - the Apache config has to allow for a long period, in which requests can be processed fully
2 - the web application at the problematic domain is not coded in such a way that requests can be processed efficiently and fast

b) Nginx as a proxy, implying that you also have to tweak Nginx config to allow for a long period, in which Nginx waits for (full) request processing by Apache

In essence, the root cause of the problem is the (problematic) web application itself: any tweak of Apache or Nginx is just a work-around.

In conclusion, it is recommended that you (in chronological order of relevance!)

1 - tackle problem a.2: have a look at potential offending code in the web application and improve code, whenever possible,

2 - tackle problem a.1: tweak Apache config (if offending code cannot be found and/or cannot be changed) and it is recommended to use the following values for

max_execution_time: 300
max_input_time: 300

and please note that

- any higher value is not really a good idea: if your web application requires more than 300 seconds (i.e. 5 minutes), then something is horribly wrong,
- the value of 300 seconds (and 600 seconds at most) will nicely "fit into" the default Nginx config, as shipped with Plesk: higher values often require Nginx tweaking,

3 - tackle problem b: tweak Nginx to wait a bit longer for responses by changing

- timeout settings: this will be done automatically when changing max_execution_time settings (see notes below!)
- keepalive settings: Plesk applies by default the Nginx directive keepalive_timeout 65; and that is sufficient, but you can always increase it slightly (see notes below!)

and please note that

- the Nginx directives to work with are: proxy_connect_timeout, proxy_send_timeout, proxy_read_timeout, send_timeout and keepalive_timeout
- Plesk default values for proxy_connect_timeout, proxy_send_timeout, proxy_read_timeout and send_timeout are sufficient and set at 600 (seconds)
- Plesk default value for keepalive_timeout is 65 (seconds)
- Plesk values for max_execution_time (Apache setting) and proxy_read_timeout (Nginx setting) are identical at the domain level (and 600 seconds server-wide)
- Plesk values for max_execution_time (Apache setting) and proxy_read_timeout (Nginx setting) are an override of server-wide settings
- do not add custom Nginx settings at the domain level for proxy_send_timeout and send_timeout: the server-wide setting of 600 (seconds) is sufficient (and any increase of this value beyond 600 seconds will increase security risks and/or can cause severe overloads of the Apache server)
- any custom Nginx setting at the domain level for proxy_connect_timeout can help, but in most cases an increase beyond the value of 600 seconds will only add value in the scenario of debugging the web application: in production, there often is no need to change this value when using Apache + Nginx in a Plesk eco-environment

and finally note that it is not recommended to fiddle with Nginx timeout settings at all: just use default values, whenever possible!

All of the above is just an elaborate way to explain the whole story and to help you figure out what to do next.

Simply stated, just apply the solution proposed by @IgorG (and Nginx config will be automatically adjusted to some extent), but be aware of the fact that you have to find and solve the root cause of the problem, being a web application with code that is not able to process requests efficiently and fast.

Hope the above helps a bit.

Regards.............

Mark Bailey · Dec 28, 2018

Thank you. To further clarify, this is an AWS server with 16 GB RAM and the default Plesk configuration values, and only one WordPress site that gets minimal traffic. So this is one reason we were very concerned, it isn't any kind of overload situation, we don't know why light traffic still resulted in this. The WordPress site and plugins work fine and generally perform very quickly. (We've been developing in WP for a decade now, so the site followed best practices.)

So our concern is why a reasonably powerful server with only low traffic would still result in a situation like this. You're right, it indicates a problem in Apache or somewhere, so that makes us wonder if the Plesk defaults have an issue?

So any additional thoughts would be appreciated.

Thanks,

Mark

Noribin · Jul 25, 2024

@trialotto
There are many times you may want to up your timeout settings. Migrations, backups, time consuming processes that you cannot do otherwise.

trialotto · Jul 25, 2024

Noribin said:
@trialotto
There are many times you may want to up your timeout settings. Migrations, backups, time consuming processes that you cannot do otherwise.

@Noribin

what you are stating might sound very reasonable, but it actually is not.

In essence, Nginx as a proxy is not related to processes that run on a different (and often "lower") level.

For instance, migration and/or backup processes run on a different level - the migration / backup processes should and will not interfere with Nginx and do not require any tweaking of Nginx config.

Nevertheless, there is no right or wrong here - it is essentially a "chain" of unrelated processes that still can affect each other.

For instance, Nginx as a proxy might run in front of Apache and Apache might be unaffected by migration / backup processes, but any overload of processes can cause hold ups at the Apache level, which can then result in hickups on the Nginx level.

However, the latter "chain" of unrelated processes has nothing to do with Nginx itself - Nginx config can be patched, improved etc, but if the Apache server is affected by (other) processes, then the Nginx config tweaking will have little or no effect.

This is why I always try to speak about "the root cause of the problem" - it is very easy to solve an alleged issue, whilst leaving the actual issue unresolved.

I hope the above helps.. a (tiny) bit.

Kind regards.....

Resolved nginx Fails, Bringing Site Down

Mark Bailey

Basic Pleskian

IgorG

Plesk addicted!

trialotto

Golden Pleskian

Mark Bailey

Basic Pleskian

Noribin

New Pleskian

trialotto

Golden Pleskian

Similar threads

Resolved nginx Fails, Bringing Site Down

Mark Bailey

Basic Pleskian

IgorG

Plesk addicted!

trialotto

Golden Pleskian

Mark Bailey

Basic Pleskian

Noribin

New Pleskian

​

trialotto

Golden Pleskian

​

Similar threads