504 nginx bad gateway and Plesk firewall interface

Franco · Dec 16, 2015

Hello,
I am running CentOs 6.7 with Plesk 12.5. My config is nginx with FastCGI.

Whenever I reboot the server (with no significat error messages, by what I can see) all my WordPress websites timeout in 504 nginx bad gateway. Only after disabling the firewall rules and enabling them back the everything works again. The disable/enable I do it from the Plesk management interface, of course.

I thought I had aligned iptables by writing the current rules with
/sbin/service iptables save
but apparently something else restores incorrect rules at system reboot. What can it be? Where shall I look, any hint, please?
Moreover, I hardly touch the linux iptables and, as far as I know, I only use the Plesk firewall interface: how to make sure there's only one master source?

Regards

Kate · Dec 22, 2015

Hello,
We can analyze the issue you faced if you provide us with following log files, which should contain the error messages:

/var/log/httpd/error_log
/var/log/nginx/error.log
/var/www/vhosts/system/domain.tld/logs/access_log
/var/www/vhosts/system/domain.tld/logs/error_log

Please share with us the details.

Franco · Dec 22, 2015

Hi,
I can only provide the httpd error log as I either see nothing on the other logs or, as in the case of nginx, it is no more available (it was on the 16 December). Perhaps I should provoke the issue by rebooting the system again.
Apart form that, where shall I look to check whether the system installs a bad copy of the rules at restart?

Franco · Dec 25, 2015

Hi,
I am uploading the new log files, the problem was reproduced by rebooting at around 17:31. I tested with one of the WP websites only and its error log is empty. Got nginx 504 bad gateway and after that I immediately recycled the Plesk FW rules and all was normal again.
In my opinion all is normal, I just need to find out what is resetting my iptables to a state which prevents the system to work.

trialotto · Dec 26, 2015

@Franco,

This issue is hardly related to Apache, Nginx and/or Plesk Firewall and the working thereof.

Have a look at the opcache ini files and if you find "opcache.huge_code_pages=1", just set "opcache.huge_code_pages=0".

The above mentioned opcache settings are specific to CentOS 6.7 (and in most cases, specific to PHP 7.0.x versions).

Also note that you have a cronjob running, just after the reboot, implying that some issues can result from the cronjob running: discard these issues from the analysis.

In general, you should check

a) the PHP version and, in particular, the opcache settings (and adjust, in the way described above)

b) the proper functioning of swapfiles

c) that the firewall rules are defined properly (note: I am pretty sure that the firewall rules do not cause the issue, but ADD to the issue or the severity thereof)

and be aware of the fact that the 504 Nginx error can be caused by many things (for example, to large pages, i.e. "huge pages" that should be prevented anyway'.

In short, if a change in opcache settings does resolve your issue, you should still have a look at other settings and/or the size of the responsible pages.

Regards...

PS If you report an issue, please provide sufficient data (i.e. OS, error notifications etc.), since that would help us to identify the issue in an efficient way.

Franco · Dec 29, 2015

Hi,

thank you for all these hints which I tried to follow. Here is some of the answers:

1. Huge pages error: sure, I can do that, but that's not relevant for the current issue: the bad gateway upon restart has been there for ages, long before php7. Indeed my parameter for php7 is 1 (was not there before). I understand setting it to 0 would avoid the zend error, but isn't that voiding the php7 speed advantage?

2. Swapfiles: how do I if they function correctly? Here is my vmstat report:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 167300 373256 285252 457556 1 1 29 45 23 24 4 2 93 1 0

(again, I am on Plesk 12.5.30, CentOs 6.7, running multiple php 5.6 and 7)

3. FW rules: I know that the rules set in plesk work fine for me, but how to make sure something else does not interfere restoring bad rules at system restart? AFAIK, the only cron jons I have is ntp and the domains backups. I also have the iptables before and after the problem, but the issue is, again, who's restoring the bad ones?

Regards

trialotto · Dec 29, 2015

@Franco,

Franco said:
1. Huge pages error: sure, I can do that, but that's not relevant for the current issue: the bad gateway upon restart has been there for ages, long before php7. Indeed my parameter for php7 is 1 (was not there before). I understand setting it to 0 would avoid the zend error, but isn't that voiding the php7 speed advantage?

The "opcache.huge_code_pages=0" setting is not "voiding the php7 speed advantage", nor is it disabling opcache.

In essence, it is just preventing that, in the case that appropriate OS configuration is absent, the Huge Pages support is claiming memory in such a fashion that the server underperforms.

Franco said:
2. Swapfiles: how do I if they function correctly?

Just run the command "free -m" from the command line: any value in the columns "free, used or total" is an indication that the swap is working.

Note that, in some cases, swap is not persistent: to check this (on a dedicated server), you can have a look at the fstab file, which should contain a line for the swap file.

Also note that, if you are using a VPS, swap is mostly assigned to the VPS by the host server, implying that the fstab file on the VPS does not contain information about swap partitions.

Franco said:
3. FW rules: I know that the rules set in plesk work fine for me, but how to make sure something else does not interfere restoring bad rules at system restart? AFAIK, the only cron jons I have is ntp and the domains backups. I also have the iptables before and after the problem, but the issue is, again, who's restoring the bad ones?

You do have a number of other cronjobs, one cronjob directly follows from the casaluna-access-log.txt file: a WordPress cronjob.

The WordPress cronjobs are notorious, in the sense that they can be causing a resource overload.

To prevent this, just login into the Plesk panel and go to the domain(s) mentioned in your error logs and

1 - increase the memory_limit assigned to domain (by default 64M, change it to 128M or 256M), (and)
2 - increase the max_execution_time and max_input_time (by default 60, change it to 120)

and note that increasing the execution time will often resolve many issues at the same time.

In short, changing these values would often do the trick, in the sense that it is a decent work-around, but it has to be mentioned that this work-around does not address the root cause of the problem (implying that you can be encountering similar issues in the future and/or have to increase before mentioned values a little bit more).

By the way, one thing has to be made quite clear.

The regular tasks and cronjobs can cause the 504 Nginx error and increasing values according to points 1 and 2 can do the trick.

The WordPress cronjobs are on a whole different level and can also cause a 504 Nginx error separately (!), simply as a result of default memory settings within WordPress.

The default memory settings in WordPress are 40MB, a little bit on the low side, causing huge "delays" in php-fpm worker processes when executing scripts, tasks or cronjobs.

In order to change the WordPress memory settings, open the wp-config.php file of the domain in question and add:

/** Custom memory settings. */
define( 'WP_MEMORY_LIMIT', '256M' );
define( 'WP_MAX_MEMORY_LIMIT', '256M' );

just below the line containing the $table_prefix variable.

Note that you have to adjust the value "256M" to the value you have chosen for the memory_limit variable (!).

Also note that it is often better to set WP_MEMORY_LIMIT and WP_MAX_MEMORY_LIMIT slightly lower than the value for memory_limit.

Hope the above helps!

Regards....

Franco · Dec 29, 2015

oh, wow, I'm impressed, thanks a lot.

1. I then set the variable to 0.
2. Swap file is ok, values are not null.
3. I will analyse those WP cronjobs, although they're knew to me anddon't know where to start looking.
4. I was not aware that domain memory limits are not the same as the wp-memory limits; I will then start tuning them. When you say they should be slightly lower, something like 248M-wp and 256M-domain, or 128M-wp and 156M-domain would do it?

Fixing those values will improve things, but in my case we are not talking about occasional timeouts or affecting this or that other domain. Instead, all of them would become inaccessible and permanently. Once I had a case where all 10 domains were down for 14 hours until, after several reboots (including kvm resets) and out of desperation, I reset the firewall using the plesk interface. Since then I learned that that operation does the trick and saves me. Several months later and a long history of upgrades haven't changed the situation. I can live with that, except if I am not around when the VPS decides it's reboot time, although that's a particular rare case. Btw, I am not running php-fpm, just the regular fastcgi with nginx.

Regards

trialotto · Dec 29, 2015

@Franco

A quick word and question, before I have to run to some other engagement.

In essence, fastcgi is not that different from php-fpm, it is even more prone to resource overusage related issues. I would strongly advice to make the step towards php-fpm.

And use the rule of thumb: php-fpm with Apache, fastcgi with Nginx (and not the other way around).

In your case, it seems to be me that you have firewall, infrastructure or network related issues, if you are talking about KVM resets, reboots, multiple-domain shutdown for a long time AND the subtle hint that resetting the firewall does the trick.

My question(s): did you create a VPS yourself? And is it on a dedicated server? And if you did not create the VPS yourself, what is the hosting provider?

I suggest that we start investigating, step-by-step, in order to establish the root cause of the problem (and potentially, avoiding the problems in the future).

Regards....

504 nginx bad gateway and Plesk firewall interface

Franco

Regular Pleskian

Kate

Basic Pleskian

Franco

Regular Pleskian

Attachments

Franco

Regular Pleskian

Attachments

trialotto

Golden Pleskian

Franco

Regular Pleskian

trialotto

Golden Pleskian

Franco

Regular Pleskian

trialotto

Golden Pleskian