• Please be aware: Kaspersky Anti-Virus has been deprecated
    With the upgrade to Plesk Obsidian 18.0.64, "Kaspersky Anti-Virus for Servers" will be automatically removed from the servers it is installed on. We recommend that you migrate to Sophos Anti-Virus for Servers.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Resolved 502 bad gateway error on all domains

Kingsley

Silver Pleskian
Hello,

I have installed plesk 12.5 on Ubuntu 14.4 with 4GB RAM and everything was working fine for 8 days until recently all 9 domains went offline most of the domains are wordpress sites. made some search and found some tutorials from plesk KB which i implemented but unfortunately nothing has worked as the domains will go offline every now and then (right now it's offline). And start showing 502 error message.

If i restart nginx, apache or php-fpm then they will come back to Live and go offline after maybe 24hrs or so.

I really need help, i don't know If this is caused by plesk, i have used php-fpm, nginx, apache and hhvm too never experienced this.

Thanks
 
i had this same error every update and weekend

Code:
sudo /usr/local/psa/admin/bin/nginxmng --disable
sudo /usr/local/psa/admin/bin/nginxmng --enable
sudo /usr/local/psa/admin/bin/nginxmng --status
sudo /usr/local/psa/admin/sbin/httpdmng --reconfigure-all
 
i had this same error every update and weekend

Code:
sudo /usr/local/psa/admin/bin/nginxmng --disable
sudo /usr/local/psa/admin/bin/nginxmng --enable
sudo /usr/local/psa/admin/bin/nginxmng --status
sudo /usr/local/psa/admin/sbin/httpdmng --reconfigure-all

Hello,

What do this do please?
 
Cause
Web server configuration files are corrupted or absent.

Resolution
Re-enable nginx:
Code:
/usr/local/psa/admin/bin/nginxmng --disable
/usr/local/psa/admin/bin/nginxmng --enable
/usr/local/psa/admin/bin/nginxmng --status

Reconfigure the domains configurations:
Code:
/usr/local/psa/admin/sbin/httpdmng --reconfigure-all
Original: https://kb.plesk.com/en/123735
 
Cause
Web server configuration files are corrupted or absent.

Resolution
Re-enable nginx:
Code:
/usr/local/psa/admin/bin/nginxmng --disable
/usr/local/psa/admin/bin/nginxmng --enable
/usr/local/psa/admin/bin/nginxmng --status

Reconfigure the domains configurations:
Code:
/usr/local/psa/admin/sbin/httpdmng --reconfigure-all
Original: https://kb.plesk.com/en/123735

Alright this has been done.... i hope its fixed for real
 
Hello,

I have installed plesk 12.5 on Ubuntu 14.4 with 4GB RAM and everything was working fine for 8 days until recently all 9 domains went offline most of the domains are wordpress sites. made some search and found some tutorials from plesk KB which i implemented but unfortunately nothing has worked as the domains will go offline every now and then (right now it's offline). And start showing 502 error message.

If i restart nginx, apache or php-fpm then they will come back to Live and go offline after maybe 24hrs or so.

I really need help, i don't know If this is caused by plesk, i have used php-fpm, nginx, apache and hhvm too never experienced this.

Thanks


==================================================

This same just happened on a new centos7 server after i moved all domains to it, is it that plesk cant work with just 6 wordpress sites? and one piwik site?

here is the error

2016-03-18 07:16:49 Error 100.43.91.28 16137#0: *298936 upstream timed out (110: Connection timed out) while reading response header from upstream nginx error
2016-03-18 07:17:49 Error 141.8.143.240 16137#0: *298997 upstream timed out (110: Connection timed out) while reading response header from upstream nginx error
2016-03-18 07:31:01 Error 66.249.73.207 16137#0: *299836 upstream timed out (110: Connection timed out) while reading response header from upstream nginx error
2016-03-18 07:40:00 Error 111.13.102.132 16137#0: *300391 upstream timed out (110: Connection timed out) while reading response header from upstream nginx error
2016-03-18 07:50:22 Error 209.85.238.93 16137#0: *301010 upstream timed out (110: Connection timed out) while reading response header from upstream nginx error
2016-03-18 07:51:15 Error 23.96.184.72 16137#0: *301116 connect() failed (111: Connection refused) while connecting to upstream nginx error
2016-03-18 07:53:07 Error 209.85.238.93 12726#0: *301181 upstream timed out (110: Connection timed out) while reading response header from upstream
 
I had the same issue for few weeks, apache stopped when updates where launched by à cron job. After a lot of discussion with plesk support I found that modsecurity was causing this

Plesk support completly delete and install again modsecurity and it works since this.

Hope this help.
 
I had the same issue for few weeks, apache stopped when updates where launched by à cron job. After a lot of discussion with plesk support I found that modsecurity was causing this

Plesk support completly delete and install again modsecurity and it works since this.

Hope this help.

OHK, right now only 2 sites on the server working, the rest has been disabled... i dont know what to do
 
@Kingsley, @otlet and @J-F Brouillette,

I have been writing about some FPM related errors, see: https://talk.plesk.com/threads/potential-fpm-errors-after-update-mu25-solution.337375/

The second post is something that @Kingsley should have a look at: just run the command "cat /proc/user_beancounters" and have a look at the "failcnt" column, which should contain values equal to zero only. If this is not the case, just type "reboot" in the command console.

After that, sites should be working properly.

The second post is also something that can be related to what @J-F Brouillette has been saying: a mishappening at update time, potentially related to refused resource allocation(s).

Regards........
 
@Chris1,

The "upstream error notifications" are related to socket issues: even if Apache is running and/or restarting fine, the "upstream" issue can occur.

In short, it is a FPM related issue that is not necessarily related to Apache.

By the way, with respect to "Apache graceful restart discussion or issues", the following.

Apache essentially reloads or restarts all the time (depending on the various settings) and, most of the times, that works fine.

Apache will not restart or reload properly, if some other process fails. Consider the failure of mod_security, due to issues with updates of Atomicorp rulesets.

Each time a restart or reload fails, Apache will end up restarting "improper" and locking some resources: in short, all failures cumulate to a non-stable system.

The essence solution to this Apache mayhem is to use Nginx (reduce the load on Apache) AND to apply a stop/start sequence to Apache on a frequent basis.

Note that a stop/start sequence is very different to a restart, in the sense that a stop/start sequence actually stops Apache and ends the garbled use of resources by Apache.

You can imagine by now why I do not participate in all the "graceful restart vs reload" discussions with respect to Apache.

The reasons for that are:

- an Apache reload is essentially not wrong, except for the cases in which garbled resource usage is already present
- an Apache restart is essentially not wrong in the case of garbled resource usage, but a stop/start sequence for Apache is better

In short, anyone focusing on Apache issues should not focus on reloads (i.e. continuing issues) or graceful restarts (i.e. delay of issues, that will re-occur sooner or later), but emphasize the root cause of the problem: garbled resource usage, often due to multiple reload and restart failures of Apache in the past.

Just simply apply a stop/start sequence or, if required, do a (software) reboot.

Sure, one can state the allowing graceful restarts will prevent future issues with Apache and that is partly true, but not entirely true: the "external" issues, such as problems with ruleset updating and/or Apache module updates (that are improper), will always be present and in that case the graceful restart does not help at all.

A good illustration is the fact that MU23 and MU25 caused these "external" issues, with a "graceful restart setting" having no effect at all.

Hope the above helps and explains a bit.

Regards........
 
Ok it just worked for 24hours.
Suddenly today, some minutes ago, all server websites are 502.
I had to stop NGINX to restore normal operations.
 
From what i see in NGINX error log it look like an intrusion try on the server.
We find this in log (domain and ip address masked) :
Code:
2016/03/23 19:01:19 [error] 3633#0: *20349 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET /scripts/cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT> HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/scripts/cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT>", host: "www. mydomain.com"
2016/03/23 19:01:20 [error] 3633#0: *20353 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET / HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/", host: "www. mydomain.com"
2016/03/23 19:01:20 [error] 3633#0: *20349 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET /cgi-bin/cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT> HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/cgi-bin/cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT>", host: "www. mydomain.com"
2016/03/23 19:01:22 [error] 3633#0: *20349 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET /cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT> HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/cvslog.cgi?file=<SCRIPT>window.alert</SCRIPT>", host: "www. mydomain.com"
2016/03/23 19:01:25 [error] 3633#0: *20379 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET / HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/", host: "www. mydomain.com"
2016/03/23 19:01:26 [error] 3633#0: *20381 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET /index.jsp HTTP/1.1", upstream: "http://MY_IP_ADDRESS:7080/index.jsp", host: "sjfklsjfkldfjklsdfjdlksjfdsljk.foo."
2016/03/23 19:01:26 [error] 3633#0: *20396 connect() failed (111: Connection refused) while connecting to upstream, client: 54.235.163.229, server: , request: "GET /login/login.html HTTP/1.1", upstream: "https://MY_IP_ADDRESS:7081/login/login.html", host: "www. mydomain.com"
I copied here just 7 lines but the same IP tried lot of request (2000+ in 12 min) to several urls and it seems after sometimes NGINX bug and lost apache socket ...

At end of error log we see that :
Code:
2016/03/23 19:15:31 [alert] 3633#0: *26447 open socket #34 left in connection 4
2016/03/23 19:15:31 [alert] 3633#0: *26336 open socket #25 left in connection 8
2016/03/23 19:15:31 [alert] 3633#0: *26463 open socket #33 left in connection 18
2016/03/23 19:15:31 [alert] 3633#0: *26474 open socket #29 left in connection 20
2016/03/23 19:15:31 [alert] 3633#0: *26472 open socket #24 left in connection 25
2016/03/23 19:15:31 [alert] 3633#0: *26479 open socket #32 left in connection 26
2016/03/23 19:15:31 [alert] 3633#0: *26372 open socket #3 left in connection 34
2016/03/23 19:15:31 [alert] 3633#0: *26473 open socket #20 left in connection 49
2016/03/23 19:15:31 [alert] 3633#0: *26467 open socket #35 left in connection 55
2016/03/23 19:15:31 [alert] 3633#0: *26454 open socket #22 left in connection 57
2016/03/23 19:15:31 [alert] 3633#0: *26475 open socket #18 left in connection 63
2016/03/23 19:15:31 [alert] 3633#0: *26477 open socket #30 left in connection 64
2016/03/23 19:15:31 [alert] 3633#0: *26476 open socket #26 left in connection 65
2016/03/23 19:15:31 [alert] 3633#0: *26478 open socket #31 left in connection 70
2016/03/23 19:15:31 [alert] 3633#0: *26470 open socket #23 left in connection 71
2016/03/23 19:15:31 [alert] 3633#0: *26453 open socket #19 left in connection 73
2016/03/23 19:15:31 [alert] 3633#0: *26471 open socket #28 left in connection 77
2016/03/23 19:15:31 [alert] 3633#0: aborting

How to prevent this ?
 
Last edited:
@Pascal_Netenvie

Nice investigating job.

The solution is simple: add a custom rule to Plesk Firewall, called "bad" for instance, and add the IP 54.235.163.229 (assuming this is not your own IP) with "Deny on all ports".

Also, if you did not enable or install Fail2Ban, just install and/or enable it.

Both actions are to be executed, as the most important actions in a multitude of actions that you can undertake to prevent similar "attacks" (what´s in a word?) in the future.

Note that it would also be wise to remove the "test" directory from the httpdocs directory in the affected domains, since the "attack" is aiming at specific scripts.

Hope this helps a bit.

Regards!
 
@Pascal_Netenvie

Nice investigating job.

The solution is simple: add a custom rule to Plesk Firewall, called "bad" for instance, and add the IP 54.235.163.229 (assuming this is not your own IP) with "Deny on all ports".

Also, if you did not enable or install Fail2Ban, just install and/or enable it.

Both actions are to be executed, as the most important actions in a multitude of actions that you can undertake to prevent similar "attacks" (what´s in a word?) in the future.

Note that it would also be wise to remove the "test" directory from the httpdocs directory in the affected domains, since the "attack" is aiming at specific scripts.

Hope this helps a bit.

Regards!

i dont follow
 
Hi Trialotto,
We enable modsecurity and fail2ban on all servers. And always delete all files in httpdocs before website install.
And for sure as soon as i saw this log the IP was added to recidive jail and a specific rules was created in Firewall.

The problem is since that, despite a server reboot and executions of following commands, restart NGINX lead to 502 for all websites ...
Code:
sudo /usr/local/psa/admin/bin/nginxmng --disable
sudo /usr/local/psa/admin/bin/nginxmng --enable
sudo /usr/local/psa/admin/bin/nginxmng --status
sudo /usr/local/psa/admin/sbin/httpdmng --reconfigure-all
 
Last edited:
Back
Top