• We value your experience with Plesk during 2024
    Plesk strives to perform even better in 2025. To help us improve further, please answer a few questions about your experience with Plesk Obsidian 2024.
    Please take this short survey:

    https://pt-research.typeform.com/to/AmZvSXkx
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Question Fail2ban trying to setup a rule to ban ai bots

Tomsohner

New Pleskian
Hello plesk users,
we are trying to setup a fail2ban jail filter rule which can ban most badbots and mainly the big 4 aws-amazon, ovh, alibaba, and hetzner.

We had follow from plesk official and/or from talk forum
Issue - Problem tuning fail2ban or other proposal-solutions
but we can't totally block amazonbots which is the worst of all.
As for failregex we use Kaspar's solution which after our tests it's really faster that other older solutions.
thread Issue - Default plesk-apache-badbot fail2ban doesn't work

failregex = ^<HOST> -[^"]*"(?:GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d+)" \d+ \d+ "[^"]*" "[^"]*(?:%(badbots)s|%(badbotscustom)s)[^"]*"$

Our now fail2ban rule settings works fine for most cases as like old traditional bots AhrefsBot, semrush, or ....... and other semi-new one's as like facebookexternalhit, Google Extended with great results but many amazonbots by pass our settings and still hits our plesk servers
also from our experience we observed alibada IP ranges which hit mainly WordPress sites with an amazing number of hits!!!!!
as for WP installations as many of us we set Wordfence as a standard plugin which works but how many millions of IPs to set as for Deny Access or ......
Unfortunately Plesk do not protect us with any specific-targeting for badbots extension and we have to fight with them by our own efforts.

Our badbots filter settings are the following in which we had add some other badbots and at the begging we had set ai bots and then we had re-arrange to an A-Z so if someone want to add a new one or to try to read this filter or find a badbot will be much easier.

So any help or idea will be great for us but also for all Plesk Users.
Thank you for your answers!

Our Fail2ban settings.
[Definition]
badbotscustom = thesis-research-bot
badbots = 80legs|360Spider|anthropic-ai|CCBot|claudebot|ClaudeBot|Claude-Web|ChatGPT|GPTBot|HTTrack|acunetix|adscanner|ag_dm_spider|aiHitBot|Ahrefs|AhrefsBot|Alibaba|alibababot|ALittle|Amazon|amazonbot|AmazonBot|applebot|Applebot|BacklinkCrawler|baidu|Baiduspider|Barkrowler|babbar|BLEXBot|BUbiNG|Buck|Bytespider|Bytedance|chimebot|Cliqzbot|clshttp|Cohere|cohere-ai|CommonCrawl|coccoc|coccocbot|coccocbot-image|DataForSeoBot/1\.0|DiffBot|DigExt|domaincrawler|DomainCrawler|DomainRe-AnimatorBot|domaintools|DotBot|Exabot|extract|Ezooms|GarlikCrawler|ChatGPT-User|ggpht|Google Extended|Google-Extended|Gosign-Security-Crawler|grab|gumgum-bot|FacebookBot|facebookexternalhit|fidget-spinner-bot|fr-crawler|harvest|HaosouSpider|JobboerseBot|jobs.de-Robot|ICCrawler|Imagesift|ImagesiftBot|IndeedBot|Keybot|Kraken|LamarkBot|LieBaoFast|Linguee|LinkpadBot|LinkStats|Lipperhey-Kaus-Australis|ltx71|magpie-crawler|majestic12|Mb2345Browser|meanpathbot|MegaIndex|MegaIndex\.ru|MetaJobBot|MJ12|MJ12Bot|mj12bot|mindUpBot|miner|MQQBrowser|netEstate|nikto|oBot|Omgili|Omgilibot|OpenHoseBot|openlinkprofiler|opensiteexplorer|Paqlebot|paqlebot|PerplexityBot|petalbot|petalsearch|petalsearchBot|PhantomJS|Plista|plukkie|postmanruntime|python-requests|Qwantify|SabsimBot|SafeDNSBot|scrapy|ScreamingFrogSEOSpider|SearchmetricsBot|seek|SeekportBot|Semrush|SemrushBot|SemrushBot-BA|SemrushBot-SA|serpstatbot|SISTRIX|Sistrix|sentibot|seocompany|SEOdiver|SEOkicks|SEOkicks-Robot|seoscanners|seznam|SeznamBot|sg-Orbiter|Siteliner|Snap|sogou|spbot|spot|Squigglebot|SquigglebotBot|ssearch_bot|SurveyBot|R6_CommentReader|RestSharp|rogerbot|TalkTalk|ThumbSniper|trendictionbot|trendkite-akashic-crawler|turnitinbot|TwengaBot|UCBrowser|um-IC|UnisterBot|Uptimebot|VelenPublicWebCrawler|VoidEYE|WBSearchBot|webcrawl|webprosbot|winhttp|wotbox|yandex|YandexBot|YottaShopping_Bot|YouBot|ZoominfoBot|Atomic_Email_Hunter/4\.0|atSpider/1\.0|autoemailspider|bwh3_user_agent|China Local Browse 2\.6|ContactBot/0\.2|ContentSmartz|DataCha0s/2\.0|DBrowse 1\.4b|DBrowse 1\.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1\.4b|Educate Search VxB|EmailCollector|EmailSiphon|EmailSpider|EmailWolf 1\.00|ESurf15a 15|ExtractorPro|Franklin Locator 1\.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|Guestbook Auto Submitter|Industry Program 1\.0\.x|ISC Systems iRc Search 2\.1|IUPUI Research Bot v 1\.9a|LARBIN-EXPERIMENTAL \(efp@gmx\.net\)|LetsCrawl\.com/1\.0 \+http\://letscrawl\.com/|Lincoln State Web Browser|LMQueueBot/0\.2|LWP\:\:Simple/5\.803|Mac Finder 1\.0\.xx|MFC Foundation Class Library 4\.0|Microsoft URL Control - 6\.00\.8xxx|Missauga Locate 1\.0\.0|Missigua Locator 1\.9|Missouri College Browse|Mizzu Labs 2\.2|Mo College 1\.9|MVAClient|(?:Mozilla/\d+\.\d+ )?Jorgee|Mozilla/2\.0 \(compatible; NEWT ActiveX; Win32\)|Mozilla/3\.0 \(compatible; Indy Library\)|Mozilla/3\.0 \(compatible; scan4mail \(advanced version\) http\://www\.peterspages\.net/?scan4mail\)|Mozilla/4\.0 \(compatible; Advanced Email Extractor v2\.xx\)|Mozilla/4\.0 \(compatible; Iplexx Spider/1\.0 http\://www\.iplexx\.at\)|Mozilla/4\.0 \(compatible; MSIE 5\.0; Windows NT; DigExt; DTS Agent|Mozilla/4\.0 efp@gmx\.net|Mozilla/5\.0 \(Version\: xxxx Type\:xx\)|NameOfAgent \(CMS Spider\)|NASA Search 1\.0|Nsauditor/1\.x|PBrowse 1\.4b|PEval 1\.4b|Poirot|Port Huron Labs|Production Bot 0116B|Production Bot 2016B|Production Bot DOT 3016B|Program Shareware 1\.0\.2|PSurf15a 11|PSurf15a 51|PSurf15a VA|psycheclone|RSurf15a 41|RSurf15a 51|RSurf15a 81|searchbot admin@google\.com|ShablastBot 1\.0|snap\.com beta crawler v0|Snapbot/1\.0|Snapbot/1\.0 \(Snap Shots&#44; \+http\://www\.snap\.com\)|Sogou|sogou develop spider|sogou music spider|Sogou Orion spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sogou spider|Sogou web spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sohu agent|SSurf15a 11 |TSurf15a 11|TrackBack/1\.02|Under the Rainbow 2\.2|User-Agent\: Mozilla/4\.0 \(compatible; MSIE 6\.0; Windows NT 5\.1\)|VadixBot|WebVulnCrawl\.unknown/1\.0 libwww-perl/5\.803|WebEMailExtrac|Wells Search II|WEP Search 00
failregex = ^<HOST> -[^"]*"(?:GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d+)" \d+ \d+ "[^"]*" "[^"]*(?:%(badbots)s|%(badbotscustom)s)[^"]*"$
ignoreregex =
datepattern = ^[^\[]*\[({DATE})
{^LN-BEG}
 
Today, 100 million requests came from Amazonbot and crashed the 48 core 256 GB RAM server. I hope Plesk will solve this.
This isn't something for Plesk to solve. This is something you should mitigate yourself, by customizing the fail2ban filter(s) to your own need for example.

For what it's worth, the Amazonbot is generally considered to be a legitimate bot, which respects standard robots.txt rules. If it's causing (overload) issues it's worth investigating if the requests where legitimate (or faked by imposter bots).
 
Further to our above post we had try to solve this issue with Clouflare which in Free Mode masking IPs so fail2ban do not blocking any badbot or AI so you have to create your own custom rules also almost same happens with Cloudflare PRO.

Also it’s not a useful solution to add some millions of IPs into Apache-Nginx setting’s with Denny Access as server works harder which also creating an amazing number of debug entries with first result to slow down server response on the other hand with our above post for fail2ban, (which do not include all badbots cases), creating an amazing number of thousands banned IPs which also have as result to slow down server response.

With these settings we might ban some aggressive and useless bots as like msnbot or applebot or which do not give any benefit to us as these bots adding full content for their visitors view/use …… without any feedback to content issuer.

The main issue is that providers-servicers as Amazon or Google Cloud Platform or Hetzner or OVH or Digital Ocean or Alibaba or …… do not separate their owned IPs ranges into at least 2 categories one for bots or ….. and one for good users as like APIs or providers or ….. and when you ban a whole IP range of Amazon you ban except of bad users also good users as an example pingdom or ……..

A lastly short solution for this problem for us was to add into fall2ban - Ban IP option – whole badbots IPs range but with this solution, (as we mention above), we might can handle bots attacks but we also ban other good users and their services.

Unfortunately Plesk developers never respect their clients and never care for their real needs, they forgot the reason why a server provider or user decide to use their environment, we want a stable and secure environment to work for our projects or for our clients and not to spend our valuable working time to solve bugs issues or to set up own rules or, or or, ……. for problems which appearing every day. An easy way to find out that is thousands of posts by the Official Plesk support website or this forum.

Some years ago we was fighting against hackers, simple badbots, simple crawlers, spyders etc. in our days we have to fight against millions of them and most important against millions of AI bots which do not respect robot.txt or .htaccess rules or having ways to bypass fail2ban, or other security solutions modules or plugins as like wordfence for WP etc.

So Kaspar we believe that you are wrong in this case “This is something you should mitigate yourself, by customizing the fail2ban filter(s) to your own need for example” Plesk obligated to find and provide a ready solution for us, of course it’s not an easy decision for them-Plesk to do that as they work together with and/or cooperate with them Amazon - Google Cloud ……..

We welcome and we would like to know if there is any other solution or proposal!!!!! Thank you in advance!!!

Our lastly settings with Ban IP solution which do not exclude total IPs ranges as like for Amazon 3.*.*.* or 18.*.*.* or 34.*.*.* or 54.*.*.*
Into General Settings we had set
IP address ban period * 6000sec
Time interval for detection of subsequent attacks * only 600sec

because if you set more than this will creates more than 15000 banned IPs which cause server issues
Number of failures before the IP address is banned * 1

Then we had set
Fail2ban>Ban IP> (option)

Into dashboard appearing Tools & Settings>IP Address Banning>Banned IP Addresses

91.132.184.0/22 plesk-permanent-ban

94.131.101.0/24 plesk-permanent-ban

92.240.206.* plesk-permanent-ban

91.92.249.0/24 plesk-permanent-ban

91.212.166.0/24 plesk-permanent-ban

85.208.98.0/24 plesk-permanent-ban

85.208.96.0/24 plesk-permanent-ban

8.219.4.0/24 plesk-permanent-ban

74.80.208.0/24 plesk-permanent-ban

49.0.200.0/21 plesk-permanent-ban

217.113.194.0/24 plesk-permanent-ban

207.46.0.0/16 plesk-permanent-ban

206.168.32.0/22 plesk-permanent-ban

185.191.171.0/24 plesk-permanent-ban

185.191.170.0/24 plesk-permanent-ban

17.0.0.0/8 plesk-permanent-ban

166.108.192.0/20 plesk-permanent-ban

159.138.80.0/20 plesk-permanent-ban

157.60.0.0/16 plesk-permanent-ban

157.56.0.0/14 plesk-permanent-ban

157.54.0.0/15 plesk-permanent-ban

154.3.0.0/16 plesk-permanent-ban

139.28.160.0/22 plesk-permanent-ban

122.8.184.0/22 plesk-permanent-ban

114.119.128.0/19 plesk-permanent-ban
 
@Tomsohner It's nothing that Plesk should do as it is not a task for control panel developers, but a task for system administrators. Your approach of banning subnets is one valid approach. At my business here we currently ban 600+ subnets practically permanantly, and there has never been a complaint by customers that someone would not reach their website or mail. The attacks are driven from other servers, not from dial-up lines, and those servers that attack normally do not connect to your website or mail. It's a good approach to just ban them for good. You can always lift a ban if a complaint should come in.

Regarding the previously mentioned "Amazonbot" (
the Amazonbot is generally considered to be a legitimate bot,
, unfortunately, some individuals are abusing the easy AWS cloud instance setup process to abuse their systems for spoofing their bot. This is done by the hundreds, if not thousands. So in many cases where you see "Amazonbot" from an Amazon IP address, it's not really Amazon that is crawling your space, but it's malicious hacker activity. Why would you allow Amazon to crawl your website in the first place? Just block the "bot" for good, and you'll have piece and quiet. It won't have any negative impact on SEO.
 
Back
Top