• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue Websites stop working until I restart apache and nginx

dgarcia90

New Pleskian
Server operating system version
Ubuntu 20.04 LTS
Plesk version and microupdate number
18.0.47
Hello everyone, we have a dedicated hosting with plesk. We have a really weird issue where our websites stop responding until I restart apache and nnginx or if I wait like 5 - 10 minutes they are accesible again. I haven't found a cause or pattern for this issue, as it could happen once a month, or once in three months but it happened twice yesterday and today it happened again.

I can defend myself with linux but when it comes to troubleshooting I'm no expert.

I appriciate any help you can give me

Thanks!
 
Hello @dgarcia90 ,
I would begin by checking the logs for the timeframe when the issue appeared.
The logs I would check for:
  • The logs of a few affected websites (Domains > example.com > Logs) - maybe there is a pattern, e.g. similar errors on different websites that were down.
  • The webserver logs /var/log/nginx/error.log and /var/log/apache2/error.log
  • There also can be something in the output of journalctl --unit=apache2 (or --unit=nginx)
 
Hello @dgarcia90 ,
I would begin by checking the logs for the timeframe when the issue appeared.
The logs I would check for:
  • The logs of a few affected websites (Domains > example.com > Logs) - maybe there is a pattern, e.g. similar errors on different websites that were down.
  • The webserver logs /var/log/nginx/error.log and /var/log/apache2/error.log
  • There also can be something in the output of journalctl --unit=apache2 (or --unit=nginx)
Hey, thanks for your reply.

Unfortunetly there's nothing in those logs at the time the issue happened :(
 
Hey guys, any idea of what else I could check, I had another "crash" yesterday night

Thanks in advance
Kind regards,
Daniel García
 
What about related error messages in /var/log/syslog ?
 
What about related error messages in /var/log/syslog ?
Thanks for your reply, I've just checked that log and besides some unauthorized login attemps for my smtp or ssh services (I have fail2ban enable) there's nothing there that I can see that could cause that issue.
 
Plesk this week, "out of the blue", added the line " ssl_dhparam /opt/psa/etc/dhparams2048.pem;" to the file /etc/nginx/conf.d/ssl.conf

check with these commands if that line was indeed added just before your sites went off-line
ls -l /etc/nginx/conf.d/ssl.conf
grep param /etc/nginx/conf.d/ssl.conf

It however doesn't explain why the sites went off-line then, because that entry is a valid entry in normal conditions.

In my case it wasn't a valid entry as I always had such a line added to my nginx config for ages and nginx doesn't allow to have a double entry for that.
That's my punishment for being progressive.

I made a thread for that as well, but somehow @IgorG didnt comment there. ;-)
I presume he didn't want to bump the thread.


I'm mentioning it here because it can put you at ease regarding your search for an "unlawful" entry of your server.

Plesk will rewrite many of your configs and will restart services as a result of an update.
99% of the time you will not notice it.
We just don't live in a perfect world....

When things go wrong I will call it an "update from hell"
I had one this week.
In this case neither parties (me, nginx nor Plesk) were to blame.
 
Last edited:
Plesk this week, "out of the blue", added the line " ssl_dhparam /opt/psa/etc/dhparams2048.pem;" to the file /etc/nginx/conf.d/ssl.conf

check with these commands if that line was indeed added just before your sites went off-line


It however doesn't explain why the sites went off-line then, because that entry is a valid entry in normal conditions.

In my case it wasn't a valid entry as I always had such a line added to my nginx config for ages and nginx doesn't allow to have a double entry for that.
That's my punishment for being progressive.

I made a thread for that as well, but somehow @IgorG didnt comment there. ;-)
I presume he didn't want to bump the thread.


I'm mentioning it here because it can put you at ease regarding your search for an "unlawful" entry of your server.

Plesk will rewrite many of your configs and will restart services as a result of an update.
99% of the time you will not notice it.
We just don't live in a perfect world....

When things go wrong I will call it an "update from hell"
Hello, thanks for your reply. Looks like I also have those lines added in that file
 
...and when were they added??

But still, it should not be a problem. Maybe once, but not repeatedly...
 
In that case this particular change has no bearing on your problem...
Do you still have this problem?
 
In that case this particular change has no bearing on your problem...
Do you still have this problem?
I do have this problem but it's completly random. It can happen once a month, or twice a month. Then a few months with no issue, then starts happening again...
 
@dgarcia90 The "5 - 10 minutes" is suspicious for a fail2ban ban, because anywhere between 3 and 10 minutes is the normal first response ban time for the jails. When your server appears to be "offline", can you still access it from a different ip address, for example through a cell with your phone (bypassing your wifi)?
 
@dgarcia90 The "5 - 10 minutes" is suspicious for a fail2ban ban, because anywhere between 3 and 10 minutes is the normal first response ban time for the jails. When your server appears to be "offline", can you still access it from a different ip address, for example through a cell with your phone (bypassing your wifi)?
Thanks for your reply. When the issue happens. and I'm able to see it (cause sometimes it happens at night time and I cannot do any test) I'm not able to see to any of my websites no matter where I try from. I tried my phone(4g connection), my PC, another PC that is in different network (public ip is different) and the outcome was the same in every attempt
 
Here are some more ideas for you:

Sometimes attackers or bad bots send many requests against a website. This can have the effect, that Apache beefs up its instances quickly, reaching the max of 255 instances. It will then stop accepting new requests and wait until the running requests can be served. It is also possible that PHP FPM is loaded up with many scripts that it ought to handle, but the maximum number of allowed children is reached so that no further children can be spawned. In either case you may experience a several minute wait until the system becomes responsive again. The best defense against this is to have all Fail2Ban jails in place so that attackers are blocked.

It is also possible that RAM usage explodes. This can trigger swapping, and this again can slow down a system considerably so that it feels as if it became unresponsive. A great counter measure is using CGroups that are included as an extension in Plesk. You can limit RAM and CPU usage with it.
 
Back
Top