• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue Googlebot blocked and uptime monitor traffic blocked too...

LionKing

Regular Pleskian
Server operating system version
Ubuntu Linux
Plesk version and microupdate number
18.048
Hi guys.

Ok so I been looking into this for some time and it seems that Plesk blocks traffic from Googlebot and other indexing bots requesting pages, (Probably also Bing which Yahoo/Duckduckgo also use):
1678224177374.png
Yandex too:
1678224324086.png

The same happens for the up-time monitor service we use from WPMU Dev:
1678224284067.png
Its is WordPress Multisite (Our corporate website) and when we ran the old server with C-panel this issue did not exist which leads me to think that Plesk would be the obvious culprit of the issue.

A quick sneak peak at WPMU Dev's up-time monitoring logs reveal this which might be clue (or not).
1678224681431.png
They unfortunately do not allow to change method, but it would probably not make a difference since we already know that legitimate indexing bots are also being blocked.


Any ideas? :rolleyes:

Thanks in advance. ;)
Kind regards
LionKing
 
There can be a number of settings that block bots. For example if the WP Toolkit security option to block bad bots is activated, this will lock out some. There could also be an entry in your .htaccess file to block bots. ModSecurity blocks also result in 403 errors, however then you can see an entry in error_log by "ModSecurity" at the given time.
 
WP Toolkit security option to block bad bots is activated, this will lock out some
Thanks Peter. Well We do not use WP Toolkit because it cannot figure out our setup environment which is quite different. Plus we already use services that does that same
There could also be an entry in your .htaccess file to block bots
. As for .htaccess you might be correct and I suppose we need to dig through it and see if there is any .htaccess rule that might be blocking something.

ModSecurity blocks also result in 403 errors, however then you can see an entry in error_log by "ModSecurity" at the given time.
Thanks, We will looks in the logs too to see if there might something that leads us on the right tracj,

Kind regards
 
Ok so I been looking into this for some time and it seems that Plesk blocks traffic from Googlebot and other indexing bots requesting pages, (Probably also Bing which Yahoo/Duckduckgo also use):
That's strange. I've never seen any of those bots use HTTP/1.0. Do you have a stupid loadbalancer in front of your server?
That can't work with modern (v)hosting because HTTP/1.0 is pre-SNI. All such requests would be served by the default server for the IP, same as if you'd directly supply the IP instead of a domain name in the url's host part. Did you assign IPs exclusively to domains (ending up in /etc/nginx/plesk.conf.d/ip_default/)?
 
That's strange. I've never seen any of those bots use HTTP/1.0. Do you have a stupid loadbalancer in front of your server?
That can't work with modern (v)hosting because HTTP/1.0 is pre-SNI. All such requests would be served by the default server for the IP, same as if you'd directly supply the IP instead of a domain name in the url's host part.
Cloudflare's infrastructure is maybe doing stupid things..(?).
We use them for security layer and obviously for the caching part. So its not really our servers that do the initially response. All though the logs provided above is logged by our server/s.
t. Did you assign IPs exclusively to domains (ending up in /etc/nginx/plesk.conf.d/ip_default/)?
No we chosen just to use one fixed IP for this server. So all our business apps/systems/company website share the same IP.

Kind regards
 
Thanks for the reply Mow.
And what site did you set as default? (the site you get when you access the IP)
None.
1.) You need to know the URL/domain name to access our systems on our server. If make a misspelling you will either be served with 404 (if domain is correct), or if nxdomain/and/if typo; you will be served the default landing page which I also mention below here after this with just using the IP address. .

2.) If you just enter the IP address itself in browser and hit "enter" on your keyboard, you will see the default "splash screen/landing page" of Plesk.
(Although we customized it because we think it is unnecessary to announce to the world that our servers are running Plesk.)

Kind regards
 
1.) You need to know the URL/domain name to access our systems on our server. If make a misspelling you will either be served with 404 (if domain is correct), or if nxdomain/and/if typo; you will be served the default landing page which I also mention below here after this with just using the IP address.
Then, that default landing page is also what you (or the bots) get with HTTP/1.0.
 
Then, that default landing page is also what you (or the bots) get with HTTP/1.0.
Well yes for all request that doesn't concern the webapps that is installed.
The above logs are for for the company corporate website: https//:takemarket.co.uk so that is not what you are seing (HTTP/1.0 /default page).

Hmm... With that said; I mean it does say "HTTP/1.0", could it be that we do not allow any unsecure requests I.E it must be over the secure encrypted "https" protocol that we seing this? :rolleyes:
 
Well yes for all request that doesn't concern the webapps that is installed.
The above logs are for for the company corporate website: https//:takemarket.co.uk so that is not what you are seing (HTTP/1.0 /default page).
Well they would try to access subpages under that default page which do not exist, and get a 403.
I have no idea why HEAD on / throws a 405, though.
Hmm... With that said; I mean it does say "HTTP/1.0", could it be that we do not allow any unsecure requests I.E it must be over the secure encrypted "https" protocol that we seing this? :rolleyes:
No, the HTTP protocol version is independent from SSL.
 
Interesting.
Thanks a bunch for the feedback. Sorry for the misspelled link by the way, here is the correctly formatted link: takemarket.co.uk
The conundrum still remains though. I guess we (me and my colleges), just need to keep digging and hopefully we will find something sooner or later.

Kind regards
 
Back
Top