- Server operating system version
- CentOS 7.9, but all affected
- Plesk version and microupdate number
- Obsidian 18.0.48, but previous also affected
When your server is using Google DNS (8.8.8.8 etc.), Nginx and one or more of your websites' SSL configuration has checked "OCSP stapling", you will want to know about this:
OCSP stands for "Online Certificate Status Protocol". It is a way that the server checks, whether an SSL certificate is valid, so that the client does not need to. It - normally - makes SSL connections faster, because it saves the client the time for the extra check. To do OCSP, the server needs to connect to the trust center of SSL certificates and have the validity verified. To connect, the server needs to resolve the trust center's OCSP verification domain.
Since Friday, Dec 16, 2022 I observed on some servers with a large number of domains that domain reconfiguration processes got stuck. Actually, the "nginx -t" syntax check seemed to hang. An Nginx reload did not complete timely, so that the reload failed with "nginx.service start-pre operation timed out." It became impossible to restart or reload Nginx. It also became impossible to apply configuration changes to domains.
That situation is known if the OCSP server of Let's Encrypt cannot be reached. For example
But in the cases observed, that was not the case. r3.o.lencr.org did resolve! So why are the processes hanging anyway?
Now here is the thick of the plot you will want to know about if you are hosting a large number of domains on your server and are experiencing the same: An strace on the hung processes showed, that these were actually running and doing something, but Google DNS 8.8.8.8 took a very long time to resolve the r3.o.lencr.org domain. On systems with many domains, this adds up to a considerable number of seconds. So many seconds, that Nginx reloads and restarts run into timeouts and the "nginx -t" command appears to be hanging.
As of Dec 18, 2022, the issue with Google DNS on that domain still exists intermittently. So likely some users will experience this situation.
How to fix it?
If your servers are affected, you can immediately fix the problem by adding another public nameserver as the first resolver to your /etc/resolv.conf file. On my tests I choose Cloudflare 1.1.1.1, but you can also use another resolver. Just make sure that it is not affected by resolution issues of r3.o.lencr.org. Your /etc/resolv.conf will then look somewhat like this
There is no need to restart any service afterwards. Even the "hung" processes will continue swiftly a few seconds after the new resolver was added. The issue is all about the slow 8.8.8.8 name resolution.
If adding another resolver is no option for you, you can let your system resolve the OCSP verification domain by adding this line to your /etc/hosts file:
OCSP stands for "Online Certificate Status Protocol". It is a way that the server checks, whether an SSL certificate is valid, so that the client does not need to. It - normally - makes SSL connections faster, because it saves the client the time for the extra check. To do OCSP, the server needs to connect to the trust center of SSL certificates and have the validity verified. To connect, the server needs to resolve the trust center's OCSP verification domain.
Since Friday, Dec 16, 2022 I observed on some servers with a large number of domains that domain reconfiguration processes got stuck. Actually, the "nginx -t" syntax check seemed to hang. An Nginx reload did not complete timely, so that the reload failed with "nginx.service start-pre operation timed out." It became impossible to restart or reload Nginx. It also became impossible to apply configuration changes to domains.
That situation is known if the OCSP server of Let's Encrypt cannot be reached. For example
Code:
dig +short r3.o.lencr.org
;; connection timed out; no servers could be reached
Now here is the thick of the plot you will want to know about if you are hosting a large number of domains on your server and are experiencing the same: An strace on the hung processes showed, that these were actually running and doing something, but Google DNS 8.8.8.8 took a very long time to resolve the r3.o.lencr.org domain. On systems with many domains, this adds up to a considerable number of seconds. So many seconds, that Nginx reloads and restarts run into timeouts and the "nginx -t" command appears to be hanging.
As of Dec 18, 2022, the issue with Google DNS on that domain still exists intermittently. So likely some users will experience this situation.
How to fix it?
If your servers are affected, you can immediately fix the problem by adding another public nameserver as the first resolver to your /etc/resolv.conf file. On my tests I choose Cloudflare 1.1.1.1, but you can also use another resolver. Just make sure that it is not affected by resolution issues of r3.o.lencr.org. Your /etc/resolv.conf will then look somewhat like this
Code:
nameserver 1.1.1.1
nameserver 8.8.8.8
.
.
.
There is no need to restart any service afterwards. Even the "hung" processes will continue swiftly a few seconds after the new resolver was added. The issue is all about the slow 8.8.8.8 name resolution.
If adding another resolver is no option for you, you can let your system resolve the OCSP verification domain by adding this line to your /etc/hosts file:
Code:
23.32.238.51 r3.o.lencr.org
Last edited: