Azurel
Silver Pleskian
Username:
TITLE
Fail2ban: Filter"apache-badbots" updates?
PRODUCT, VERSION, OPERATING SYSTEM, ARCHITECTURE
All
PROBLEM DESCRIPTION
Were there ever any updates for this filter? Where did the current useragents (badbots) come from? Can I make suggestions for a update? Maybe you should add this filter on Github so it can be discussed and added there?
Besides crawlers, media libraries are very nasty crawlers, which often crawl senselessly in addition. User search for something in this media libraries and his crawler search this overall installed plugins/websites.
It would be nice if someone would check the filter to see if the current entries make sense at all and add new entries. Even better if webmasters can do this centrally together and perhaps split the filter into apps that crawl, websites that crawl and spiders, etc.
These are agents that I have discovered with usually several thousand calls.
# jellyfin => Jellyfin (there a many plugins)
# curl/7.61.1
# Wget/1.13.4 (linux-gnu)
# Python-urllib/3.7
# Python/3.6 aiohttp/3.6.2
# python-requests/2.24.0
# XenForo/2.x (WoH Board: Anime & Hentai Forum)
# WordPress/5.4.2; Home - animegeeks.de
# Buck/2.2; (+About Buck)
# Scrapy/1.5.2 (+Scrapy | A Fast and Powerful Scraping and Web Crawling Framework)
# uipbot/1.0
# ZoominfoBot (zoominfobot at zoominfo dot com)
# crawler4j (GitHub - yasserg/crawler4j: Open Source Web Crawler for Java)
# TprAdsTxtCrawler/1.0
# axios/0.19.2 => GitHub - axios/axios: Promise based HTTP client for the browser and node.js
# weborama-fetcher (+HOMEPAGE FR)
# Apache-HttpClient/4.5.3
# Photon/1.0
# Qwantify/1.0
# okhttp/3.12.3 => https://github.com/square/okhttp
# w3m/0.5.3+git20200507 => https://sourceforge.net/projects/w3m/
# pimeyes.com crawler => https://pimeyes.com/en/contact
# magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net) => https://www.brandwatch.com/legal/magpie-crawler/
# Symfony HttpClient/Curl
# got (https://github.com/sindresorhus/got)
# aarquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +http://arquivo.pt)
# AccompanyBot
# DomainStatsBot/1.0 (https://domainstats.com/pages/our-bot)
# DarcyRipper => https://usermanual.wiki/Document/Darcy20Ripper20User20Manual.2006674845/html
# Elisabot
# DF Bot 1.0
# yacybot (/global; amd64 Linux 5.8.6-gentoo; java 11.0.8; Europe/de) http://yacy.net/bot.html
# Nuzzel
# CyotekWebCopy/1.8 CyotekHTTP/2.0
# Leuchtfeuer Crawler
# newspaper/0.2.8 => https://github.com/codelucas/newspaper/
# MauiBot ([email protected])
# yaanibot
# CCBot/2.0 (https://commoncrawl.org/faq/)
# Go-http-client/1.1
# AhrefsBot => https://ahrefs.com/de/robot
# Mediatoolkitbot => https://www.mediatoolkit.com/robot
# ltx71 =>http://ltx71.com
STEPS TO REPRODUCE
see description
ACTUAL RESULT
see description
EXPECTED RESULT
see description
ANY ADDITIONAL INFORMATION
(DID NOT ANSWER QUESTION)
YOUR EXPECTATIONS FROM PLESK SERVICE TEAM
Answer the question
TITLE
Fail2ban: Filter"apache-badbots" updates?
PRODUCT, VERSION, OPERATING SYSTEM, ARCHITECTURE
All
PROBLEM DESCRIPTION
Were there ever any updates for this filter? Where did the current useragents (badbots) come from? Can I make suggestions for a update? Maybe you should add this filter on Github so it can be discussed and added there?
Besides crawlers, media libraries are very nasty crawlers, which often crawl senselessly in addition. User search for something in this media libraries and his crawler search this overall installed plugins/websites.
It would be nice if someone would check the filter to see if the current entries make sense at all and add new entries. Even better if webmasters can do this centrally together and perhaps split the filter into apps that crawl, websites that crawl and spiders, etc.
These are agents that I have discovered with usually several thousand calls.
# jellyfin => Jellyfin (there a many plugins)
# curl/7.61.1
# Wget/1.13.4 (linux-gnu)
# Python-urllib/3.7
# Python/3.6 aiohttp/3.6.2
# python-requests/2.24.0
# XenForo/2.x (WoH Board: Anime & Hentai Forum)
# WordPress/5.4.2; Home - animegeeks.de
# Buck/2.2; (+About Buck)
# Scrapy/1.5.2 (+Scrapy | A Fast and Powerful Scraping and Web Crawling Framework)
# uipbot/1.0
# ZoominfoBot (zoominfobot at zoominfo dot com)
# crawler4j (GitHub - yasserg/crawler4j: Open Source Web Crawler for Java)
# TprAdsTxtCrawler/1.0
# axios/0.19.2 => GitHub - axios/axios: Promise based HTTP client for the browser and node.js
# weborama-fetcher (+HOMEPAGE FR)
# Apache-HttpClient/4.5.3
# Photon/1.0
# Qwantify/1.0
# okhttp/3.12.3 => https://github.com/square/okhttp
# w3m/0.5.3+git20200507 => https://sourceforge.net/projects/w3m/
# pimeyes.com crawler => https://pimeyes.com/en/contact
# magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net) => https://www.brandwatch.com/legal/magpie-crawler/
# Symfony HttpClient/Curl
# got (https://github.com/sindresorhus/got)
# aarquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +http://arquivo.pt)
# AccompanyBot
# DomainStatsBot/1.0 (https://domainstats.com/pages/our-bot)
# DarcyRipper => https://usermanual.wiki/Document/Darcy20Ripper20User20Manual.2006674845/html
# Elisabot
# DF Bot 1.0
# yacybot (/global; amd64 Linux 5.8.6-gentoo; java 11.0.8; Europe/de) http://yacy.net/bot.html
# Nuzzel
# CyotekWebCopy/1.8 CyotekHTTP/2.0
# Leuchtfeuer Crawler
# newspaper/0.2.8 => https://github.com/codelucas/newspaper/
# MauiBot ([email protected])
# yaanibot
# CCBot/2.0 (https://commoncrawl.org/faq/)
# Go-http-client/1.1
# AhrefsBot => https://ahrefs.com/de/robot
# Mediatoolkitbot => https://www.mediatoolkit.com/robot
# ltx71 =>http://ltx71.com
STEPS TO REPRODUCE
see description
ACTUAL RESULT
see description
EXPECTED RESULT
see description
ANY ADDITIONAL INFORMATION
(DID NOT ANSWER QUESTION)
YOUR EXPECTATIONS FROM PLESK SERVICE TEAM
Answer the question