• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue googlebot being blocked by 301

Daniel Richards

New Pleskian
I'm having a strange issue where google can see robot.txt exists but cannot always read it.

I've tried all of the obvious like encoding, permissions etc etc,
but no joy.

I look in /var/log/nginx/access.log and I find

66.249.69.26 - - [23/Jul/2018:05:36:36 +0000] "GET /robots.txt HTTP/1.1" 301 178 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +Googlebot - Search Console Help)"

I can manually ask google to fetch robots.txt and a similar entry will appear with 301 and google report fetch not possible. (it indexes the rest of my site just fine)

It doesn't appear in the domain logs - just the system one above.

How can I enable the logging so that I can see the exact query googlebot is making? Can I determine if googlebot is requesting by IP, or domain name? - if so is it using http or https, or www or no www?

Thanks in advance....


(any ideas on recitfying this also welcome)
 
Test it in Google Webmaster tools and you should see if Google can read the file and access the site and/or what the issue is.

As long as the file is readable in domain.com/robots.txt and the test comes back green with no errors, you'll be fine.

Also check to see which domain Google is pulling the robots.txt from. In Webmaster tools, domain.com and www.domain.com are two different domains. So, if you're seeing a 301 or 302 redirect, Google is using the wrong domain to retrieve your robot.txt file. Be sure and select the site for Google to use as preferred to avoid redirects and duplicate url errors.
 
Last edited:
Thank you @themew for the time taken to respond. It is google webmaster tools that have alerted me to the problem.
It's intermittent, with some requests for robots.txt working fine with 200, and others resulting in a 301 redirect which is reported in webmaster tools as a failure.



The request for a specific domain that fails occurs in the server log /var/log/nginx/access.log but does not occur in the domain log, so it indicates that either google is submitting a faulty request, or something on my server is causing it to fail

How can I make the nginx and apache log - include the exact domain and all details sent to it by google - as well as discover where it was redirected to?

thanks in advance
 
Back
Top