• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue Apache dies due to missing SSL certs

HostaHost

Regular Pleskian
We have hundreds of Plesk servers and at least several times per week, mostly during the nightly log rotation that Plesk schedules for roughly 4am local time, apache dies. It can also occur just from someone adding or removing a site from a server though.

The cause of the problem is references in the /etc/httpd/conf/plesk.conf.d/server.conf to SSL certificate files that do not exist. So apache goes down when it tries to restart, and takes down every website on the server. The error will be something along these lines:

/etc/httpd/conf/plesk.conf.d/server.conf:
...
SSLCertificateFile: file '/usr/local/psa/var/certificates/cert-GY1GIs' does not exist or is empty

In the server.conf, it will be in a section like this:

Code:
<VirtualHost 192.168.0.1:7081 127.0.0.1:7081>
                ServerName "default-192_168_0_1"
                UseCanonicalName Off
                DocumentRoot "/var/www/vhosts/default/htdocs"
                ScriptAlias /cgi-bin/ "/var/www/vhosts/default/cgi-bin"

                SSLEngine on
                SSLVerifyClient none
                SSLCertificateFile "/usr/local/psa/var/certificates/cert-GY1GIs"

It is also always (100% of the time) an error related to an IP address that is not currently in use by any website on the server.

There are two fixes:

1) Run /usr/local/psa/admin/bin/httpdmng --reconfigure-all
2) Look at the server.conf and find the IP in question, then go into the web interface and change the default SSL for the IP in question at the IP level.

I see a similar issue in http://kb.odin.com/en/123399 from Plesk 11.5, and I seem to recall this issue going back a while, so it obviously still has some method of occurring.

Is this a known issue, is there a fix? It's getting really annoying that we have customers going offline at 4am several times per week because of this.

More importantly, why does Plesk not do an apache syntax test before telling it to restart? Apache is not as smart as nginx and will just blindly 'restart' without checking its config, so then it's down and can't start back up. Nginx, same server, after same issue is occurring, will produce an error if you try to restart it, refusing to restart:

Code:
Error: Unable to make action: Unable to manage service by nginx_control: ('restart', 'nginx'). Error: /usr/local/psa/admin/sbin/nginx-config execution failed: 
nginx: [emerg] BIO_new_file("/usr/local/psa/var/certificates/cert-GY1GIs") failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/usr/local/psa/var/certificates/cert-GY1GIs','r') error:2006D080:BIO routines:BIO_new_file:no such file) 
nginx: configuration file /etc/nginx/nginx.conf test failed

so at least nginx tries to keep things running, but if apache is down, so are the sites anyway.
 
We've seen the same thing, however not during nightly maintenance, but randomly when customers have been playing around with SSL certs and kept changing their configuration frequently within a short period of time, probably within an Apache restart window or during a restart/reload. It is something that should definitely be looked into, because for administrators it is impossible to foresee the issue or to prevent it happening. Obviously it happens rarely and randomly, but sooner or later, it will happen.
 
Oh joy, new redhat/centos httpd updates, which caused apache to restart on all servers tonight, and thanks to Plesk leaving broken configs in place, tons of servers with apache outages as a result.
 
> It is something that should definitely be looked into, because for administrators it is impossible to foresee the issue or to prevent it happening. Obviously it happens rarely and randomly, but sooner or later, it will happen.

Yep I've also seen this happen.
 
One part of this issue was solved with the 2.1 update of the extension. Another part will probably be solved soon. This is a timing issue as the restart/reload of the server occurs before the certificate file is saved to disk and available to other software to read it or that an old certificate file is deleted before a reload of Apache occurs. It was discussed in Forwarded to devs - Let's Encrypt still causing issues with web server configuration on updating certificates and in further detail in private conversation. The issue was confirmed and will be resolved in a future update of the extension.
 
> The issue was confirmed and will be resolved in a future update of the extension.

Thanks for the update. Great that this has been confirmed and is being worked on.
 
Back
Top