• Please be aware: Kaspersky Anti-Virus has been deprecated
    With the upgrade to Plesk Obsidian 18.0.64, "Kaspersky Anti-Virus for Servers" will be automatically removed from the servers it is installed on. We recommend that you migrate to Sophos Anti-Virus for Servers.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Issue googlebot being blocked by 301

Daniel Richards

New Pleskian
I'm having a strange issue where google can see robot.txt exists but cannot always read it.

I've tried all of the obvious like encoding, permissions etc etc,
but no joy.

I look in /var/log/nginx/access.log and I find

66.249.69.26 - - [23/Jul/2018:05:36:36 +0000] "GET /robots.txt HTTP/1.1" 301 178 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +Googlebot - Search Console Help)"

I can manually ask google to fetch robots.txt and a similar entry will appear with 301 and google report fetch not possible. (it indexes the rest of my site just fine)

It doesn't appear in the domain logs - just the system one above.

How can I enable the logging so that I can see the exact query googlebot is making? Can I determine if googlebot is requesting by IP, or domain name? - if so is it using http or https, or www or no www?

Thanks in advance....


(any ideas on recitfying this also welcome)
 
Test it in Google Webmaster tools and you should see if Google can read the file and access the site and/or what the issue is.

As long as the file is readable in domain.com/robots.txt and the test comes back green with no errors, you'll be fine.

Also check to see which domain Google is pulling the robots.txt from. In Webmaster tools, domain.com and www.domain.com are two different domains. So, if you're seeing a 301 or 302 redirect, Google is using the wrong domain to retrieve your robot.txt file. Be sure and select the site for Google to use as preferred to avoid redirects and duplicate url errors.
 
Last edited:
Thank you @themew for the time taken to respond. It is google webmaster tools that have alerted me to the problem.
It's intermittent, with some requests for robots.txt working fine with 200, and others resulting in a 301 redirect which is reported in webmaster tools as a failure.



The request for a specific domain that fails occurs in the server log /var/log/nginx/access.log but does not occur in the domain log, so it indicates that either google is submitting a faulty request, or something on my server is causing it to fail

How can I make the nginx and apache log - include the exact domain and all details sent to it by google - as well as discover where it was redirected to?

thanks in advance
 
Back
Top