• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Resolved What does Enable Bot Protection Actually Block?

thinkjarvis

Basic Pleskian
Server operating system version
Ubuntu 20.04.5 LTS
Plesk version and microupdate number
Plesk Obsidian Version 18.0.49 Update #2
What bots does the enable bot protection in WordPress Toolkit for Plesk actually do?
Is it a modification to htaccess? Is it php directives?

How can I see this list?
Can I edit this list?

Seems a bit bold to put an option in for blocking bad bots without any documentation.
 
Hi @thinkjarvis, that's a really good question. The idea of the function is to always respond with a 403 error when the user_agent string of an incoming request is found in a list of bot names. There is a list, but before posting it here I'd like to ask staff if that is still current. I'll take a note to come back to this thread once I have a response.
 
Last edited:
@
Hi @thinkjarvis, that's a really good question. The idea of the function is to always respond with a 403 error when the user_agent string of an incoming is found in a list of bot names. There is a list, but before posting it here I'd like to ask staff if that is still current. I'll take a note to come back to this thread once I have a response.
@Peter Debik
Thanks for responding. Would it be possible for you to post your list anyway or what it actually does? If I turn it on does it insert lines in htaccess? Or directives somewhere else?

Context - One of my sites is having TTFB response time issues, the VPS server has 83 websites on it and none of the other 82 are having problems. I cannot identify a reason why the html index/home page file would take as long as it is to load. In contrast one of the sites I migrated to the new VPS server had response time problems on the old shared server and is now running excellently on the new VPS server.

The only potential cause is bot traffic recorded in web stats:
The site with problems is using up to 2gb of bandwidth serving mj12bot alone with other neutral bots (or bad depending on perspective) adding a few more gigabytes of bandwidth use per month. MJ12bot made 100 million page requests in December!

I have disallowed it in robots.txt, contacted majestic and had the domain added to mj12bots no crawl list and blocked the user agent manually in htaccess.

If I turn on the block bots option in Plesk I need to know:

  1. What bots it is blocking - So I can also add this to robots.txt
  2. Where the directives are added? htaccess? server side?
  3. How much control I have over this list in case I want to allow one of the blocked bots
 
I cannot answer these questions yet, I need the team to respond first. It is very important to get such technical answers right.

But I can recommend that you insert this sequence at the beginning of your .htaccess file, regardless what other settings you have. Robots.txt is not helping at all, especially not with bad bots, and it is also not helping with many SEO testing engines that are creating unnecessary traffic. For that reason, it is much better to block bad bots and crawlers preferrably by fail2ban for the whole server, by an Nginx rule or simply by an .htaccess rewrite sequence. I'd do this one, but you are free to change the user agents mentioned to whatever you feel fit:

Code:
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} (PetalBot|UptimeRobot|seocompany|LieBaoFast|SEOkicks|Uptimebot|Cliqzbot|ssearch_bot|domaincrawler|AhrefsBot|spot|DigExt|Sogou|MegaIndex.ru|majestic12|80legs|SISTRIX|HTTrack|Semrush|MJ12|MJ12bot|MJ12Bot|Ezooms|CCBot|TalkTalk|Ahrefs|BLEXBot) [NC] 
RewriteRule .* - [F]
 
I cannot answer these questions yet, I need the team to respond first. It is very important to get such technical answers right.

But I can recommend that you insert this sequence at the beginning of your .htaccess file, regardless what other settings you have. Robots.txt is not helping at all, especially not with bad bots, and it is also not helping with many SEO testing engines that are creating unnecessary traffic. For that reason, it is much better to block bad bots and crawlers preferrably by fail2ban for the whole server, by an Nginx rule or simply by an .htaccess rewrite sequence. I'd do this one, but you are free to change the user agents mentioned to whatever you feel fit:

Code:
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} (PetalBot|UptimeRobot|seocompany|LieBaoFast|SEOkicks|Uptimebot|Cliqzbot|ssearch_bot|domaincrawler|AhrefsBot|spot|DigExt|Sogou|MegaIndex.ru|majestic12|80legs|SISTRIX|HTTrack|Semrush|MJ12|MJ12bot|MJ12Bot|Ezooms|CCBot|TalkTalk|Ahrefs|BLEXBot) [NC]
RewriteRule .* - [F]

Thanks Peter, I had this exact solution give or take from a google search but I couldnt tell if it was working or not or whether listing multiple bots like that in an array would work so I have them listed individually at the moment.

I'll wait to hear back regarding the official answer before I make any more changes. If the block bots just adds that to .htaccess then I can edit it after the option has been selected.
 
This is what the Nginx conf looks like when you activate the bot protection in the WP Toolkit:
Code:
# "Enable bot protection"
# To remove this rule, revert this security measure on each WordPress installation on this domain
if ($http_user_agent ~* "(?:acunetix|BLEXBot|domaincrawler\.com|LinkpadBot|MJ12bot/v|majestic12\.co\.uk|AhrefsBot|TwengaBot|SemrushBot|nikto|winhttp|Xenu\s+Link\s+Sleuth|Baiduspider|HTTrack|clshttp|harvest|extract|grab|miner|python-requests)") {
    return 403;
}
 
This is what the Nginx conf looks like when you activate the bot protection in the WP Toolkit:
Code:
# "Enable bot protection"
# To remove this rule, revert this security measure on each WordPress installation on this domain
if ($http_user_agent ~* "(?:acunetix|BLEXBot|domaincrawler\.com|LinkpadBot|MJ12bot/v|majestic12\.co\.uk|AhrefsBot|TwengaBot|SemrushBot|nikto|winhttp|Xenu\s+Link\s+Sleuth|Baiduspider|HTTrack|clshttp|harvest|extract|grab|miner|python-requests)") {
    return 403;
}
@maartenv @Peter Debik
Probably more important is:

Can you confirm that when I tick the Enable Bot Protection option - That Google, Bing, Yahoo, Duck Duck Go and other GOOD bots are not blocked and that only bad bots are blocked?

You can see why I wouldnt just blindly enable this as there is no explanation in plesk as to what it actually does.

It will save me a heap of time to be able to just go through and tick the box in Plesk rather than manualyl blocking bots.
 
Only the bots listed in this string are blocked:
acunetix | BLEXBot | domaincrawler\.com | LinkpadBot | MJ12bot/v | majestic12\.co\.uk | AhrefsBot | TwengaBot | SemrushBot | nikto | winhttp | Xenu\s+Link\s+Sleuth | Baiduspider | HTTrack | clshttp | harvest | extract | grab | miner |python-requests

Googlebot, Bingbot, DDG, and others good bots are safe and won't be blocked.
 
The official bot list is

acunetix
BLEXBot
domaincrawler.com
LinkpadBot
MJ12bot
majestic12.co.uk
AhrefsBot
TwengaBot
semrushBot
nikto
winhttp
Xenus Link Sleuth
Baiduspider
HTTrack
clshttp
harvest
extract
grab
miner
python-requests

The blocking is done by the Nginx configuration entry as described above.

Currently, the list is not user-configurable. There are plans to make it configurable in the future.
 
The official bot list is

acunetix
BLEXBot
domaincrawler.com
LinkpadBot
MJ12bot
majestic12.co.uk
AhrefsBot
TwengaBot
semrushBot
nikto
winhttp
Xenus Link Sleuth
Baiduspider
HTTrack
clshttp
harvest
extract
grab
miner
python-requests

The blocking is done by the Nginx configuration entry as described above.

Currently, the list is not user-configurable. There are plans to make it configurable in the future.
Thanks Peter,

I have enabled this for the site that was experiencing problems.
I am now waiting for a TTFB score update on Google page speed to see if this has resolved the problem.

Otherwise I will need to go back to the drawing board and see if I can find the cause of the issue.
 
@Peter Debik @maartenv
After enabling bot protection I am seeing a useragent called seek now browsing the site using 1-2gb worth of bandwidth.
(user-agent: seek)

I cannot find any information about this bot.

Have you ever encountered this user agent before? I cannot find any information about it on Google.
 
Never heard of a seek bot. Do you have the IP address of this bot?

Have you tried to add it to a robots.txt file in the document root of the website?
Code:
User-agent: seek
Disallow: /
 
I have added this to robots.txt already but it is ignoring it.

IP address - AW Web Stats does not record the IP address of specific bots. I do not have the IP address for the bot.

Any suggestions?
 
I have added this to robots.txt already but it is ignoring it.

IP address - AW Web Stats does not record the IP address of specific bots. I do not have the IP address for the bot.

Any suggestions?

Can you check the access_log and access_ssl_log for this bot?

/var/www/vhosts/system/domain.com/logs/access_log
/var/www/vhosts/system/domain.com/logs/access_ssl_log
 
Looking at the log files I can see a number of entries from SeekportBot.

I think this is it. it is a german search engine.
I am going to block it to see if it resolves the issue.

Any advice on the best way to block this in conjunction with the Block Bad Bots tool in WordPress toolkit?

I do have the default Plesk Apache-Badbots jail turned on but I do not think it is actually working.

I have added the following to the htaccess file for the site:

# Block via User Agent
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (seek|SeekportBot) [NC]
RewriteRule (.*) - [F,L]
</IfModule>
 
Back
Top