Resolved 502 bad gateway error on all domains

Kingsley · Mar 15, 2016

Hello,

I have installed plesk 12.5 on Ubuntu 14.4 with 4GB RAM and everything was working fine for 8 days until recently all 9 domains went offline most of the domains are wordpress sites. made some search and found some tutorials from plesk KB which i implemented but unfortunately nothing has worked as the domains will go offline every now and then (right now it's offline). And start showing 502 error message.

If i restart nginx, apache or php-fpm then they will come back to Live and go offline after maybe 24hrs or so.

I really need help, i don't know If this is caused by plesk, i have used php-fpm, nginx, apache and hhvm too never experienced this.

Thanks

otlet · Mar 15, 2016

i had this same error every update and weekend

Code:

sudo /usr/local/psa/admin/bin/nginxmng --disable
sudo /usr/local/psa/admin/bin/nginxmng --enable
sudo /usr/local/psa/admin/bin/nginxmng --status
sudo /usr/local/psa/admin/sbin/httpdmng --reconfigure-all

Kingsley · Mar 15, 2016

otlet said:

i had this same error every update and weekend

Code:

sudo /usr/local/psa/admin/bin/nginxmng --disable
sudo /usr/local/psa/admin/bin/nginxmng --enable
sudo /usr/local/psa/admin/bin/nginxmng --status
sudo /usr/local/psa/admin/sbin/httpdmng --reconfigure-all

Hello,

What do this do please?

otlet · Mar 15, 2016

Cause
Web server configuration files are corrupted or absent.

Resolution
Re-enable nginx:

Code:

/usr/local/psa/admin/bin/nginxmng --disable
/usr/local/psa/admin/bin/nginxmng --enable
/usr/local/psa/admin/bin/nginxmng --status

Reconfigure the domains configurations:

Code:

/usr/local/psa/admin/sbin/httpdmng --reconfigure-all

Original: https://kb.plesk.com/en/123735

Kingsley · Mar 15, 2016

otlet said:
Cause
Web server configuration files are corrupted or absent.

Resolution
Re-enable nginx:

Code:

/usr/local/psa/admin/bin/nginxmng --disable /usr/local/psa/admin/bin/nginxmng --enable /usr/local/psa/admin/bin/nginxmng --status

Reconfigure the domains configurations:

Code:

/usr/local/psa/admin/sbin/httpdmng --reconfigure-all

Original: https://kb.plesk.com/en/123735

Alright this has been done.... i hope its fixed for real

Kingsley · Mar 15, 2016

@otlet didn't work, everything is down again. Seems like this is the last time am going to use plesk.

Kingsley · Mar 17, 2016

Hello,

I have installed plesk 12.5 on Ubuntu 14.4 with 4GB RAM and everything was working fine for 8 days until recently all 9 domains went offline most of the domains are wordpress sites. made some search and found some tutorials from plesk KB which i implemented but unfortunately nothing has worked as the domains will go offline every now and then (right now it's offline). And start showing 502 error message.

If i restart nginx, apache or php-fpm then they will come back to Live and go offline after maybe 24hrs or so.

I really need help, i don't know If this is caused by plesk, i have used php-fpm, nginx, apache and hhvm too never experienced this.

Thanks

==================================================

This same just happened on a new centos7 server after i moved all domains to it, is it that plesk cant work with just 6 wordpress sites? and one piwik site?

here is the error

2016-03-18 07:16:49 Error 100.43.91.28 16137#0: *298936 upstream timed out (110: Connection timed out) while reading response header from upstream nginx error
2016-03-18 07:17:49 Error 141.8.143.240 16137#0: *298997 upstream timed out (110: Connection timed out) while reading response header from upstream nginx error
2016-03-18 07:31:01 Error 66.249.73.207 16137#0: *299836 upstream timed out (110: Connection timed out) while reading response header from upstream nginx error
2016-03-18 07:40:00 Error 111.13.102.132 16137#0: *300391 upstream timed out (110: Connection timed out) while reading response header from upstream nginx error
2016-03-18 07:50:22 Error 209.85.238.93 16137#0: *301010 upstream timed out (110: Connection timed out) while reading response header from upstream nginx error
2016-03-18 07:51:15 Error 23.96.184.72 16137#0: *301116 connect() failed (111: Connection refused) while connecting to upstream nginx error
2016-03-18 07:53:07 Error 209.85.238.93 12726#0: *301181 upstream timed out (110: Connection timed out) while reading response header from upstream

J-F Brouillette · Mar 19, 2016

I had the same issue for few weeks, apache stopped when updates where launched by à cron job. After a lot of discussion with plesk support I found that modsecurity was causing this

Plesk support completly delete and install again modsecurity and it works since this.

Hope this help.

Kingsley · Mar 19, 2016

J-F Brouillette said:
I had the same issue for few weeks, apache stopped when updates where launched by à cron job. After a lot of discussion with plesk support I found that modsecurity was causing this

Plesk support completly delete and install again modsecurity and it works since this.

Hope this help.

OHK, right now only 2 sites on the server working, the rest has been disabled... i dont know what to do

trialotto · Mar 19, 2016

@Kingsley, @otlet and @J-F Brouillette,

I have been writing about some FPM related errors, see: https://talk.plesk.com/threads/potential-fpm-errors-after-update-mu25-solution.337375/

The second post is something that @Kingsley should have a look at: just run the command "cat /proc/user_beancounters" and have a look at the "failcnt" column, which should contain values equal to zero only. If this is not the case, just type "reboot" in the command console.

After that, sites should be working properly.

The second post is also something that can be related to what @J-F Brouillette has been saying: a mishappening at update time, potentially related to refused resource allocation(s).

Regards........

Chris1 · Mar 19, 2016

See this for possible solution:

https://talk.plesk.com/threads/apache-reload-graceful-restart-causes-apache-segfault.335534/

trialotto · Mar 20, 2016

@Chris1,

The "upstream error notifications" are related to socket issues: even if Apache is running and/or restarting fine, the "upstream" issue can occur.

In short, it is a FPM related issue that is not necessarily related to Apache.

By the way, with respect to "Apache graceful restart discussion or issues", the following.

Apache essentially reloads or restarts all the time (depending on the various settings) and, most of the times, that works fine.

Apache will not restart or reload properly, if some other process fails. Consider the failure of mod_security, due to issues with updates of Atomicorp rulesets.

Each time a restart or reload fails, Apache will end up restarting "improper" and locking some resources: in short, all failures cumulate to a non-stable system.

The essence solution to this Apache mayhem is to use Nginx (reduce the load on Apache) AND to apply a stop/start sequence to Apache on a frequent basis.

Note that a stop/start sequence is very different to a restart, in the sense that a stop/start sequence actually stops Apache and ends the garbled use of resources by Apache.

You can imagine by now why I do not participate in all the "graceful restart vs reload" discussions with respect to Apache.

The reasons for that are:

- an Apache reload is essentially not wrong, except for the cases in which garbled resource usage is already present
- an Apache restart is essentially not wrong in the case of garbled resource usage, but a stop/start sequence for Apache is better

In short, anyone focusing on Apache issues should not focus on reloads (i.e. continuing issues) or graceful restarts (i.e. delay of issues, that will re-occur sooner or later), but emphasize the root cause of the problem: garbled resource usage, often due to multiple reload and restart failures of Apache in the past.

Just simply apply a stop/start sequence or, if required, do a (software) reboot.

Sure, one can state the allowing graceful restarts will prevent future issues with Apache and that is partly true, but not entirely true: the "external" issues, such as problems with ruleset updating and/or Apache module updates (that are improper), will always be present and in that case the graceful restart does not help at all.

A good illustration is the fact that MU23 and MU25 caused these "external" issues, with a "graceful restart setting" having no effect at all.

Hope the above helps and explains a bit.

Regards........

Kingsley · Mar 20, 2016

All domains except 2 has been disabled and the other 2 is up for 48hrs now

Pascal_Netenvie · Mar 22, 2016

Reboot the server worked for me !

Kingsley · Mar 22, 2016

Pascal_Netenvie said:
Reboot the server worked for me !

done that several times

Pascal_Netenvie · Mar 23, 2016

Ok it just worked for 24hours.
Suddenly today, some minutes ago, all server websites are 502.
I had to stop NGINX to restore normal operations.

Pascal_Netenvie · Mar 23, 2016

From what i see in NGINX error log it look like an intrusion try on the server.
We find this in log (domain and ip address masked) :

Code:

2016/03/23 19:01:19 [error] 3633#0: *20349 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET /scripts/cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT> HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/scripts/cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT>", host: "www. mydomain.com"
2016/03/23 19:01:20 [error] 3633#0: *20353 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET / HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/", host: "www. mydomain.com"
2016/03/23 19:01:20 [error] 3633#0: *20349 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET /cgi-bin/cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT> HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/cgi-bin/cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT>", host: "www. mydomain.com"
2016/03/23 19:01:22 [error] 3633#0: *20349 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET /cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT> HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT>", host: "www. mydomain.com"
2016/03/23 19:01:25 [error] 3633#0: *20379 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET / HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/", host: "www. mydomain.com"
2016/03/23 19:01:26 [error] 3633#0: *20381 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET /index.jsp HTTP/1.1", upstream: "http://MY_IP_ADDRESS:7080/index.jsp", host: "sjfklsjfkldfjklsdfjdlksjfdsljk.foo."
2016/03/23 19:01:26 [error] 3633#0: *20396 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET /login/login.html HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/login/login.html", host: "www. mydomain.com"

I copied here just 7 lines but the same IP tried lot of request (2000+ in 12 min) to several urls and it seems after sometimes NGINX bug and lost apache socket ...

At end of error log we see that :

Code:

2016/03/23 19:15:31 [alert] 3633#0: *26447 open socket #34 left in connection 4
2016/03/23 19:15:31 [alert] 3633#0: *26336 open socket #25 left in connection 8
2016/03/23 19:15:31 [alert] 3633#0: *26463 open socket #33 left in connection 18
2016/03/23 19:15:31 [alert] 3633#0: *26474 open socket #29 left in connection 20
2016/03/23 19:15:31 [alert] 3633#0: *26472 open socket #24 left in connection 25
2016/03/23 19:15:31 [alert] 3633#0: *26479 open socket #32 left in connection 26
2016/03/23 19:15:31 [alert] 3633#0: *26372 open socket #3 left in connection 34
2016/03/23 19:15:31 [alert] 3633#0: *26473 open socket #20 left in connection 49
2016/03/23 19:15:31 [alert] 3633#0: *26467 open socket #35 left in connection 55
2016/03/23 19:15:31 [alert] 3633#0: *26454 open socket #22 left in connection 57
2016/03/23 19:15:31 [alert] 3633#0: *26475 open socket #18 left in connection 63
2016/03/23 19:15:31 [alert] 3633#0: *26477 open socket #30 left in connection 64
2016/03/23 19:15:31 [alert] 3633#0: *26476 open socket #26 left in connection 65
2016/03/23 19:15:31 [alert] 3633#0: *26478 open socket #31 left in connection 70
2016/03/23 19:15:31 [alert] 3633#0: *26470 open socket #23 left in connection 71
2016/03/23 19:15:31 [alert] 3633#0: *26453 open socket #19 left in connection 73
2016/03/23 19:15:31 [alert] 3633#0: *26471 open socket #28 left in connection 77
2016/03/23 19:15:31 [alert] 3633#0: aborting

How to prevent this ?

trialotto · Mar 23, 2016

@Pascal_Netenvie

Nice investigating job.

The solution is simple: add a custom rule to Plesk Firewall, called "bad" for instance, and add the IP 54.235.163.229 (assuming this is not your own IP) with "Deny on all ports".

Also, if you did not enable or install Fail2Ban, just install and/or enable it.

Both actions are to be executed, as the most important actions in a multitude of actions that you can undertake to prevent similar "attacks" (what´s in a word?) in the future.

Note that it would also be wise to remove the "test" directory from the httpdocs directory in the affected domains, since the "attack" is aiming at specific scripts.

Hope this helps a bit.

Regards!

Kingsley · Mar 23, 2016

trialotto said:
@Pascal_Netenvie

Nice investigating job.

The solution is simple: add a custom rule to Plesk Firewall, called "bad" for instance, and add the IP 54.235.163.229 (assuming this is not your own IP) with "Deny on all ports".

Also, if you did not enable or install Fail2Ban, just install and/or enable it.

Both actions are to be executed, as the most important actions in a multitude of actions that you can undertake to prevent similar "attacks" (what´s in a word?) in the future.

Note that it would also be wise to remove the "test" directory from the httpdocs directory in the affected domains, since the "attack" is aiming at specific scripts.

Hope this helps a bit.

Regards!

i dont follow

Pascal_Netenvie · Mar 24, 2016

Hi Trialotto,
We enable modsecurity and fail2ban on all servers. And always delete all files in httpdocs before website install.
And for sure as soon as i saw this log the IP was added to recidive jail and a specific rules was created in Firewall.

The problem is since that, despite a server reboot and executions of following commands, restart NGINX lead to 502 for all websites ...

Code:

sudo /usr/local/psa/admin/bin/nginxmng --disable
sudo /usr/local/psa/admin/bin/nginxmng --enable
sudo /usr/local/psa/admin/bin/nginxmng --status
sudo /usr/local/psa/admin/sbin/httpdmng --reconfigure-all

Resolved 502 bad gateway error on all domains

Silver Pleskian

New Pleskian

Silver Pleskian

New Pleskian

Silver Pleskian

Silver Pleskian

Silver Pleskian

Plesk Certified Professional

Silver Pleskian

Golden Pleskian

Regular Pleskian

Golden Pleskian

Silver Pleskian

Regular Pleskian

Silver Pleskian

Regular Pleskian

Regular Pleskian

Golden Pleskian

Silver Pleskian

Regular Pleskian

Similar threads