• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue Googlebots bypassing Nginx

tkalfaoglu

Silver Pleskian
Server operating system version
AlmaLinux
Plesk version and microupdate number
Obsidian
Hi there. I noticed a high CPU usage in a server of ours and went to check it..

We are getting hit hard by the google bots.. the access_ssl_log contains thousands/millions of entries like:

66.249.66.204 - - [14/Nov/2022:13:18:57 +0300] "GET /index.php?rp=%2Fknowledgebase%2Ftag%2Fbar%C4%B1nd%C4%B1rmalanguage%3Dturkishlanguage%3Destonianlanguage%3Dgermanlanguage%3Destonianlangu
age%3Ddutchlanguage%3Dromanianlanguage%3Destonianlanguage%3Dczechlanguage%3Dukranianlanguage%3Dhungarianlanguage%3Darabiclanguage%3Dczechlanguage%3Dukranianlanguage%3Dswedishlanguage%3Dport
uguese-ptlanguage%3Dromanianlanguage%3Dczechlanguage%3Dportuguese-brlanguage%3Dczechlanguage%3Drussianlanguage=ukranian HTTP/1.0" 200 7070 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X B
uild/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.110 Mobile Safari/537.36 (compatible; Googlebot/2.1; +What Is Googlebot | Google Search Central | Documentation | Google Developers)"

I checked the IP and it is google indeed. I checked the settings why nginx is not being used for these innocent queries (the log file proxy_access_log file is not being used much), and although the nginx/apache settings are very permissive for this domain, they are not used:

[ ] HTTP no-cache headers are received in request
[X] HTTP authorization headers are received in request
[ ] GET nocache parameter is received in request

Any ideas what to do and how to channel these "attacks" to nginx instead?
Thanks! -t
 
But you are right -- I had to stop this, it was looping (the language parameter kept repeating), so I added the Googlebot to the string, and now it's getting 403 instead.. I already had the nginx command for a dozen more bots anyway..
Regards, -turgut
 
I checked the IP and it is google indeed. I checked the settings why nginx is not being used for these innocent queries (the log file proxy_access_log file is not being used much), and although the nginx/apache settings are very permissive for this domain, they are not used:
Plesk configures nginx so that accesses that are passed to apache are not logged. ("access_log off;" in the location section)
 
Back
Top