• Our team is looking to connect with folks who use email services provided by Plesk, or a premium service. If you'd like to be part of the discovery process and share your experiences, we invite you to complete this short screening survey. If your responses match the persona we are looking for, you'll receive a link to schedule a call at your convenience. We look forward to hearing from you!
  • We are looking for U.S.-based freelancer or agency working with SEO or WordPress for a quick 30-min interviews to gather feedback on XOVI, a successful German SEO tool we’re looking to launch in the U.S.
    If you qualify and participate, you’ll receive a $30 Amazon gift card as a thank-you. Please apply here. Thanks for helping shape a better SEO product for agencies!
  • The BIND DNS server has already been deprecated and removed from Plesk for Windows.
    If a Plesk for Windows server is still using BIND, the upgrade to Plesk Obsidian 18.0.70 will be unavailable until the administrator switches the DNS server to Microsoft DNS. We strongly recommend transitioning to Microsoft DNS within the next 6 weeks, before the Plesk 18.0.70 release.
  • The Horde component is removed from Plesk Installer. We recommend switching to another webmail software supported in Plesk.

Issue googlebot being blocked by 301

Daniel Richards

New Pleskian
I'm having a strange issue where google can see robot.txt exists but cannot always read it.

I've tried all of the obvious like encoding, permissions etc etc,
but no joy.

I look in /var/log/nginx/access.log and I find

66.249.69.26 - - [23/Jul/2018:05:36:36 +0000] "GET /robots.txt HTTP/1.1" 301 178 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +Googlebot - Search Console Help)"

I can manually ask google to fetch robots.txt and a similar entry will appear with 301 and google report fetch not possible. (it indexes the rest of my site just fine)

It doesn't appear in the domain logs - just the system one above.

How can I enable the logging so that I can see the exact query googlebot is making? Can I determine if googlebot is requesting by IP, or domain name? - if so is it using http or https, or www or no www?

Thanks in advance....


(any ideas on recitfying this also welcome)
 
Test it in Google Webmaster tools and you should see if Google can read the file and access the site and/or what the issue is.

As long as the file is readable in domain.com/robots.txt and the test comes back green with no errors, you'll be fine.

Also check to see which domain Google is pulling the robots.txt from. In Webmaster tools, domain.com and www.domain.com are two different domains. So, if you're seeing a 301 or 302 redirect, Google is using the wrong domain to retrieve your robot.txt file. Be sure and select the site for Google to use as preferred to avoid redirects and duplicate url errors.
 
Last edited:
Thank you @themew for the time taken to respond. It is google webmaster tools that have alerted me to the problem.
It's intermittent, with some requests for robots.txt working fine with 200, and others resulting in a 301 redirect which is reported in webmaster tools as a failure.



The request for a specific domain that fails occurs in the server log /var/log/nginx/access.log but does not occur in the domain log, so it indicates that either google is submitting a faulty request, or something on my server is causing it to fail

How can I make the nginx and apache log - include the exact domain and all details sent to it by google - as well as discover where it was redirected to?

thanks in advance
 
Back
Top