• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Input Issues with Google DNS and name resolution of Let's Encrypt OSCP servers -> hanging Nginx reload and "nginx -t"

Peter Debik

Community Manager until 3/2024
Plesk Team
Server operating system version
CentOS 7.9, but all affected
Plesk version and microupdate number
Obsidian 18.0.48, but previous also affected
When your server is using Google DNS (8.8.8.8 etc.), Nginx and one or more of your websites' SSL configuration has checked "OCSP stapling", you will want to know about this:

OCSP stands for "Online Certificate Status Protocol". It is a way that the server checks, whether an SSL certificate is valid, so that the client does not need to. It - normally - makes SSL connections faster, because it saves the client the time for the extra check. To do OCSP, the server needs to connect to the trust center of SSL certificates and have the validity verified. To connect, the server needs to resolve the trust center's OCSP verification domain.

Since Friday, Dec 16, 2022 I observed on some servers with a large number of domains that domain reconfiguration processes got stuck. Actually, the "nginx -t" syntax check seemed to hang. An Nginx reload did not complete timely, so that the reload failed with "nginx.service start-pre operation timed out." It became impossible to restart or reload Nginx. It also became impossible to apply configuration changes to domains.

That situation is known if the OCSP server of Let's Encrypt cannot be reached. For example
Code:
dig +short r3.o.lencr.org
;; connection timed out; no servers could be reached
But in the cases observed, that was not the case. r3.o.lencr.org did resolve! So why are the processes hanging anyway?

Now here is the thick of the plot you will want to know about if you are hosting a large number of domains on your server and are experiencing the same: An strace on the hung processes showed, that these were actually running and doing something, but Google DNS 8.8.8.8 took a very long time to resolve the r3.o.lencr.org domain. On systems with many domains, this adds up to a considerable number of seconds. So many seconds, that Nginx reloads and restarts run into timeouts and the "nginx -t" command appears to be hanging.

As of Dec 18, 2022, the issue with Google DNS on that domain still exists intermittently. So likely some users will experience this situation.

How to fix it?

If your servers are affected, you can immediately fix the problem by adding another public nameserver as the first resolver to your /etc/resolv.conf file. On my tests I choose Cloudflare 1.1.1.1, but you can also use another resolver. Just make sure that it is not affected by resolution issues of r3.o.lencr.org. Your /etc/resolv.conf will then look somewhat like this
Code:
nameserver 1.1.1.1
nameserver 8.8.8.8
.
.
.

There is no need to restart any service afterwards. Even the "hung" processes will continue swiftly a few seconds after the new resolver was added. The issue is all about the slow 8.8.8.8 name resolution.

If adding another resolver is no option for you, you can let your system resolve the OCSP verification domain by adding this line to your /etc/hosts file:
Code:
23.32.238.51 r3.o.lencr.org
 
Last edited:
@Peter Debik
I can confirm that as of 15th March 2024 this still occurs.

I encountered this exact problem this morning and luckily stumbled on this forum entry.

Has this been reported to plesk as a bug?

Can you confirm what this IP address is? I added this as written. Is this supposed to be my server IP address or the address from your post?

23.32.238.51 r3.o.lencr.org
 
Also found R3.o.lencr.org is seen as malware
Could you please explain why do you think this is a bug and related to Plesk?

Regarding IP-address of r3.o.lencr.org, I think it is some sort of cluster with different addresses that also could be changed with time and based on your geo-location; there is an example of IP-addresses that I got from my laptop,
% dig r3.o.lencr.org A
[...]
;; QUESTION SECTION:
;r3.o.lencr.org. IN A

;; ANSWER SECTION:
r3.o.lencr.org. 6 IN CNAME o.lencr.edgesuite.net.
o.lencr.edgesuite.net. 16315 IN CNAME a1887.dscq.akamai.net.
a1887.dscq.akamai.net. 7 IN A 88.221.211.17
a1887.dscq.akamai.net. 7 IN A 88.221.211.10
 
Also found R3.o.lencr.org is seen as malware
Could you please explain why do you think this is a bug and related to Plesk?

Regarding IP-address of r3.o.lencr.org, I think it is some sort of cluster with different addresses that also could be changed with time and based on your geo-location; there is an example of IP-addresses that I got from my laptop,

If its not plesk specifically its the SSL IT module which is part of Plesk in 99% of all Plesk installs.......

The bug is that OCSP stapling fails as per Peters post. So the NGINX server wont restart when you have lots of domains on the same server. It is a bug.
 
Back
Top