Hello plesk users,
we are trying to setup a fail2ban jail filter rule which can ban most badbots and mainly the big 4 aws-amazon, ovh, alibaba, and hetzner.
We had follow from plesk official and/or from talk forum
Issue - Problem tuning fail2ban or other proposal-solutions
but we can't totally block amazonbots which is the worst of all.
As for failregex we use Kaspar's solution which after our tests it's really faster that other older solutions.
thread Issue - Default plesk-apache-badbot fail2ban doesn't work
failregex = ^<HOST> -[^"]*"(?:GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d+)" \d+ \d+ "[^"]*" "[^"]*(?:%(badbots)s|%(badbotscustom)s)[^"]*"$
Our now fail2ban rule settings works fine for most cases as like old traditional bots AhrefsBot, semrush, or ....... and other semi-new one's as like facebookexternalhit, Google Extended with great results but many amazonbots by pass our settings and still hits our plesk servers
also from our experience we observed alibada IP ranges which hit mainly WordPress sites with an amazing number of hits!!!!!
as for WP installations as many of us we set Wordfence as a standard plugin which works but how many millions of IPs to set as for Deny Access or ......
Unfortunately Plesk do not protect us with any specific-targeting for badbots extension and we have to fight with them by our own efforts.
Our badbots filter settings are the following in which we had add some other badbots and at the begging we had set ai bots and then we had re-arrange to an A-Z so if someone want to add a new one or to try to read this filter or find a badbot will be much easier.
So any help or idea will be great for us but also for all Plesk Users.
Thank you for your answers!
Our Fail2ban settings.
[Definition]
badbotscustom = thesis-research-bot
badbots = 80legs|360Spider|anthropic-ai|CCBot|claudebot|ClaudeBot|Claude-Web|ChatGPT|GPTBot|HTTrack|acunetix|adscanner|ag_dm_spider|aiHitBot|Ahrefs|AhrefsBot|Alibaba|alibababot|ALittle|Amazon|amazonbot|AmazonBot|applebot|Applebot|BacklinkCrawler|baidu|Baiduspider|Barkrowler|babbar|BLEXBot|BUbiNG|Buck|Bytespider|Bytedance|chimebot|Cliqzbot|clshttp|Cohere|cohere-ai|CommonCrawl|coccoc|coccocbot|coccocbot-image|DataForSeoBot/1\.0|DiffBot|DigExt|domaincrawler|DomainCrawler|DomainRe-AnimatorBot|domaintools|DotBot|Exabot|extract|Ezooms|GarlikCrawler|ChatGPT-User|ggpht|Google Extended|Google-Extended|Gosign-Security-Crawler|grab|gumgum-bot|FacebookBot|facebookexternalhit|fidget-spinner-bot|fr-crawler|harvest|HaosouSpider|JobboerseBot|jobs.de-Robot|ICCrawler|Imagesift|ImagesiftBot|IndeedBot|Keybot|Kraken|LamarkBot|LieBaoFast|Linguee|LinkpadBot|LinkStats|Lipperhey-Kaus-Australis|ltx71|magpie-crawler|majestic12|Mb2345Browser|meanpathbot|MegaIndex|MegaIndex\.ru|MetaJobBot|MJ12|MJ12Bot|mj12bot|mindUpBot|miner|MQQBrowser|netEstate|nikto|oBot|Omgili|Omgilibot|OpenHoseBot|openlinkprofiler|opensiteexplorer|Paqlebot|paqlebot|PerplexityBot|petalbot|petalsearch|petalsearchBot|PhantomJS|Plista|plukkie|postmanruntime|python-requests|Qwantify|SabsimBot|SafeDNSBot|scrapy|ScreamingFrogSEOSpider|SearchmetricsBot|seek|SeekportBot|Semrush|SemrushBot|SemrushBot-BA|SemrushBot-SA|serpstatbot|SISTRIX|Sistrix|sentibot|seocompany|SEOdiver|SEOkicks|SEOkicks-Robot|seoscanners|seznam|SeznamBot|sg-Orbiter|Siteliner|Snap|sogou|spbot|spot|Squigglebot|SquigglebotBot|ssearch_bot|SurveyBot|R6_CommentReader|RestSharp|rogerbot|TalkTalk|ThumbSniper|trendictionbot|trendkite-akashic-crawler|turnitinbot|TwengaBot|UCBrowser|um-IC|UnisterBot|Uptimebot|VelenPublicWebCrawler|VoidEYE|WBSearchBot|webcrawl|webprosbot|winhttp|wotbox|yandex|YandexBot|YottaShopping_Bot|YouBot|ZoominfoBot|Atomic_Email_Hunter/4\.0|atSpider/1\.0|autoemailspider|bwh3_user_agent|China Local Browse 2\.6|ContactBot/0\.2|ContentSmartz|DataCha0s/2\.0|DBrowse 1\.4b|DBrowse 1\.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1\.4b|Educate Search VxB|EmailCollector|EmailSiphon|EmailSpider|EmailWolf 1\.00|ESurf15a 15|ExtractorPro|Franklin Locator 1\.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|Guestbook Auto Submitter|Industry Program 1\.0\.x|ISC Systems iRc Search 2\.1|IUPUI Research Bot v 1\.9a|LARBIN-EXPERIMENTAL \(efp@gmx\.net\)|LetsCrawl\.com/1\.0 \+http\://letscrawl\.com/|Lincoln State Web Browser|LMQueueBot/0\.2|LWP\:\:Simple/5\.803|Mac Finder 1\.0\.xx|MFC Foundation Class Library 4\.0|Microsoft URL Control - 6\.00\.8xxx|Missauga Locate 1\.0\.0|Missigua Locator 1\.9|Missouri College Browse|Mizzu Labs 2\.2|Mo College 1\.9|MVAClient|(?:Mozilla/\d+\.\d+ )?Jorgee|Mozilla/2\.0 \(compatible; NEWT ActiveX; Win32\)|Mozilla/3\.0 \(compatible; Indy Library\)|Mozilla/3\.0 \(compatible; scan4mail \(advanced version\) http\://www\.peterspages\.net/?scan4mail\)|Mozilla/4\.0 \(compatible; Advanced Email Extractor v2\.xx\)|Mozilla/4\.0 \(compatible; Iplexx Spider/1\.0 http\://www\.iplexx\.at\)|Mozilla/4\.0 \(compatible; MSIE 5\.0; Windows NT; DigExt; DTS Agent|Mozilla/4\.0 efp@gmx\.net|Mozilla/5\.0 \(Version\: xxxx Type\:xx\)|NameOfAgent \(CMS Spider\)|NASA Search 1\.0|Nsauditor/1\.x|PBrowse 1\.4b|PEval 1\.4b|Poirot|Port Huron Labs|Production Bot 0116B|Production Bot 2016B|Production Bot DOT 3016B|Program Shareware 1\.0\.2|PSurf15a 11|PSurf15a 51|PSurf15a VA|psycheclone|RSurf15a 41|RSurf15a 51|RSurf15a 81|searchbot admin@google\.com|ShablastBot 1\.0|snap\.com beta crawler v0|Snapbot/1\.0|Snapbot/1\.0 \(Snap Shots, \+http\://www\.snap\.com\)|Sogou|sogou develop spider|sogou music spider|Sogou Orion spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sogou spider|Sogou web spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sohu agent|SSurf15a 11 |TSurf15a 11|TrackBack/1\.02|Under the Rainbow 2\.2|User-Agent\: Mozilla/4\.0 \(compatible; MSIE 6\.0; Windows NT 5\.1\)|VadixBot|WebVulnCrawl\.unknown/1\.0 libwww-perl/5\.803|WebEMailExtrac|Wells Search II|WEP Search 00
failregex = ^<HOST> -[^"]*"(?:GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d+)" \d+ \d+ "[^"]*" "[^"]*(?:%(badbots)s|%(badbotscustom)s)[^"]*"$
ignoreregex =
datepattern = ^[^\[]*\[({DATE})
{^LN-BEG}
we are trying to setup a fail2ban jail filter rule which can ban most badbots and mainly the big 4 aws-amazon, ovh, alibaba, and hetzner.
We had follow from plesk official and/or from talk forum
How to Avoid High CPU Load & Block Bad Bots with Plesk
Learn how to effectively reduce CPU usage and keep bad bots and hackers away from your websites using Plesk Cgroups Manager and Fail2Ban.
www.plesk.com
but we can't totally block amazonbots which is the worst of all.
As for failregex we use Kaspar's solution which after our tests it's really faster that other older solutions.
thread Issue - Default plesk-apache-badbot fail2ban doesn't work
failregex = ^<HOST> -[^"]*"(?:GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d+)" \d+ \d+ "[^"]*" "[^"]*(?:%(badbots)s|%(badbotscustom)s)[^"]*"$
Our now fail2ban rule settings works fine for most cases as like old traditional bots AhrefsBot, semrush, or ....... and other semi-new one's as like facebookexternalhit, Google Extended with great results but many amazonbots by pass our settings and still hits our plesk servers
also from our experience we observed alibada IP ranges which hit mainly WordPress sites with an amazing number of hits!!!!!
as for WP installations as many of us we set Wordfence as a standard plugin which works but how many millions of IPs to set as for Deny Access or ......
Unfortunately Plesk do not protect us with any specific-targeting for badbots extension and we have to fight with them by our own efforts.
Our badbots filter settings are the following in which we had add some other badbots and at the begging we had set ai bots and then we had re-arrange to an A-Z so if someone want to add a new one or to try to read this filter or find a badbot will be much easier.
So any help or idea will be great for us but also for all Plesk Users.
Thank you for your answers!
Our Fail2ban settings.
[Definition]
badbotscustom = thesis-research-bot
badbots = 80legs|360Spider|anthropic-ai|CCBot|claudebot|ClaudeBot|Claude-Web|ChatGPT|GPTBot|HTTrack|acunetix|adscanner|ag_dm_spider|aiHitBot|Ahrefs|AhrefsBot|Alibaba|alibababot|ALittle|Amazon|amazonbot|AmazonBot|applebot|Applebot|BacklinkCrawler|baidu|Baiduspider|Barkrowler|babbar|BLEXBot|BUbiNG|Buck|Bytespider|Bytedance|chimebot|Cliqzbot|clshttp|Cohere|cohere-ai|CommonCrawl|coccoc|coccocbot|coccocbot-image|DataForSeoBot/1\.0|DiffBot|DigExt|domaincrawler|DomainCrawler|DomainRe-AnimatorBot|domaintools|DotBot|Exabot|extract|Ezooms|GarlikCrawler|ChatGPT-User|ggpht|Google Extended|Google-Extended|Gosign-Security-Crawler|grab|gumgum-bot|FacebookBot|facebookexternalhit|fidget-spinner-bot|fr-crawler|harvest|HaosouSpider|JobboerseBot|jobs.de-Robot|ICCrawler|Imagesift|ImagesiftBot|IndeedBot|Keybot|Kraken|LamarkBot|LieBaoFast|Linguee|LinkpadBot|LinkStats|Lipperhey-Kaus-Australis|ltx71|magpie-crawler|majestic12|Mb2345Browser|meanpathbot|MegaIndex|MegaIndex\.ru|MetaJobBot|MJ12|MJ12Bot|mj12bot|mindUpBot|miner|MQQBrowser|netEstate|nikto|oBot|Omgili|Omgilibot|OpenHoseBot|openlinkprofiler|opensiteexplorer|Paqlebot|paqlebot|PerplexityBot|petalbot|petalsearch|petalsearchBot|PhantomJS|Plista|plukkie|postmanruntime|python-requests|Qwantify|SabsimBot|SafeDNSBot|scrapy|ScreamingFrogSEOSpider|SearchmetricsBot|seek|SeekportBot|Semrush|SemrushBot|SemrushBot-BA|SemrushBot-SA|serpstatbot|SISTRIX|Sistrix|sentibot|seocompany|SEOdiver|SEOkicks|SEOkicks-Robot|seoscanners|seznam|SeznamBot|sg-Orbiter|Siteliner|Snap|sogou|spbot|spot|Squigglebot|SquigglebotBot|ssearch_bot|SurveyBot|R6_CommentReader|RestSharp|rogerbot|TalkTalk|ThumbSniper|trendictionbot|trendkite-akashic-crawler|turnitinbot|TwengaBot|UCBrowser|um-IC|UnisterBot|Uptimebot|VelenPublicWebCrawler|VoidEYE|WBSearchBot|webcrawl|webprosbot|winhttp|wotbox|yandex|YandexBot|YottaShopping_Bot|YouBot|ZoominfoBot|Atomic_Email_Hunter/4\.0|atSpider/1\.0|autoemailspider|bwh3_user_agent|China Local Browse 2\.6|ContactBot/0\.2|ContentSmartz|DataCha0s/2\.0|DBrowse 1\.4b|DBrowse 1\.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1\.4b|Educate Search VxB|EmailCollector|EmailSiphon|EmailSpider|EmailWolf 1\.00|ESurf15a 15|ExtractorPro|Franklin Locator 1\.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|Guestbook Auto Submitter|Industry Program 1\.0\.x|ISC Systems iRc Search 2\.1|IUPUI Research Bot v 1\.9a|LARBIN-EXPERIMENTAL \(efp@gmx\.net\)|LetsCrawl\.com/1\.0 \+http\://letscrawl\.com/|Lincoln State Web Browser|LMQueueBot/0\.2|LWP\:\:Simple/5\.803|Mac Finder 1\.0\.xx|MFC Foundation Class Library 4\.0|Microsoft URL Control - 6\.00\.8xxx|Missauga Locate 1\.0\.0|Missigua Locator 1\.9|Missouri College Browse|Mizzu Labs 2\.2|Mo College 1\.9|MVAClient|(?:Mozilla/\d+\.\d+ )?Jorgee|Mozilla/2\.0 \(compatible; NEWT ActiveX; Win32\)|Mozilla/3\.0 \(compatible; Indy Library\)|Mozilla/3\.0 \(compatible; scan4mail \(advanced version\) http\://www\.peterspages\.net/?scan4mail\)|Mozilla/4\.0 \(compatible; Advanced Email Extractor v2\.xx\)|Mozilla/4\.0 \(compatible; Iplexx Spider/1\.0 http\://www\.iplexx\.at\)|Mozilla/4\.0 \(compatible; MSIE 5\.0; Windows NT; DigExt; DTS Agent|Mozilla/4\.0 efp@gmx\.net|Mozilla/5\.0 \(Version\: xxxx Type\:xx\)|NameOfAgent \(CMS Spider\)|NASA Search 1\.0|Nsauditor/1\.x|PBrowse 1\.4b|PEval 1\.4b|Poirot|Port Huron Labs|Production Bot 0116B|Production Bot 2016B|Production Bot DOT 3016B|Program Shareware 1\.0\.2|PSurf15a 11|PSurf15a 51|PSurf15a VA|psycheclone|RSurf15a 41|RSurf15a 51|RSurf15a 81|searchbot admin@google\.com|ShablastBot 1\.0|snap\.com beta crawler v0|Snapbot/1\.0|Snapbot/1\.0 \(Snap Shots, \+http\://www\.snap\.com\)|Sogou|sogou develop spider|sogou music spider|Sogou Orion spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sogou spider|Sogou web spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sohu agent|SSurf15a 11 |TSurf15a 11|TrackBack/1\.02|Under the Rainbow 2\.2|User-Agent\: Mozilla/4\.0 \(compatible; MSIE 6\.0; Windows NT 5\.1\)|VadixBot|WebVulnCrawl\.unknown/1\.0 libwww-perl/5\.803|WebEMailExtrac|Wells Search II|WEP Search 00
failregex = ^<HOST> -[^"]*"(?:GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d+)" \d+ \d+ "[^"]*" "[^"]*(?:%(badbots)s|%(badbotscustom)s)[^"]*"$
ignoreregex =
datepattern = ^[^\[]*\[({DATE})
{^LN-BEG}