• Hi, Pleskians! We are running a UX testing of our upcoming product intended for server management and monitoring.
    We would like to invite you to have a call with us and have some fun checking our prototype. The agenda is pretty simple - we bring new design and some scenarios that you need to walk through and succeed. We will be watching and taking insights for further development of the design.
    If you would like to participate, please use this link to book a meeting. We will sent the link to the clickable prototype at the meeting.
  • (Plesk for Windows):
    MySQL Connector/ODBC 3.51, 5.1, and 5.3 are no longer shipped with Plesk because they have reached end of life. MariaDB Connector/ODBC 64-bit 3.2.4 is now used instead.
  • Our UX team believes in the in the power of direct feedback and would like to invite you to participate in interviews, tests, and surveys.
    To stay in the loop and never miss an opportunity to share your thoughts, please subscribe to our UX research program. If you were previously part of the Plesk UX research program, please re-subscribe to continue receiving our invitations.
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.

Resolved What does Enable Bot Protection Actually Block?

thinkjarvis

Basic Pleskian
Server operating system version
Ubuntu 20.04.5 LTS
Plesk version and microupdate number
Plesk Obsidian Version 18.0.49 Update #2
What bots does the enable bot protection in WordPress Toolkit for Plesk actually do?
Is it a modification to htaccess? Is it php directives?

How can I see this list?
Can I edit this list?

Seems a bit bold to put an option in for blocking bad bots without any documentation.
 
Hi @thinkjarvis, that's a really good question. The idea of the function is to always respond with a 403 error when the user_agent string of an incoming request is found in a list of bot names. There is a list, but before posting it here I'd like to ask staff if that is still current. I'll take a note to come back to this thread once I have a response.
 
Last edited:
@
Hi @thinkjarvis, that's a really good question. The idea of the function is to always respond with a 403 error when the user_agent string of an incoming is found in a list of bot names. There is a list, but before posting it here I'd like to ask staff if that is still current. I'll take a note to come back to this thread once I have a response.
@Peter Debik
Thanks for responding. Would it be possible for you to post your list anyway or what it actually does? If I turn it on does it insert lines in htaccess? Or directives somewhere else?

Context - One of my sites is having TTFB response time issues, the VPS server has 83 websites on it and none of the other 82 are having problems. I cannot identify a reason why the html index/home page file would take as long as it is to load. In contrast one of the sites I migrated to the new VPS server had response time problems on the old shared server and is now running excellently on the new VPS server.

The only potential cause is bot traffic recorded in web stats:
The site with problems is using up to 2gb of bandwidth serving mj12bot alone with other neutral bots (or bad depending on perspective) adding a few more gigabytes of bandwidth use per month. MJ12bot made 100 million page requests in December!

I have disallowed it in robots.txt, contacted majestic and had the domain added to mj12bots no crawl list and blocked the user agent manually in htaccess.

If I turn on the block bots option in Plesk I need to know:

  1. What bots it is blocking - So I can also add this to robots.txt
  2. Where the directives are added? htaccess? server side?
  3. How much control I have over this list in case I want to allow one of the blocked bots
 
I cannot answer these questions yet, I need the team to respond first. It is very important to get such technical answers right.

But I can recommend that you insert this sequence at the beginning of your .htaccess file, regardless what other settings you have. Robots.txt is not helping at all, especially not with bad bots, and it is also not helping with many SEO testing engines that are creating unnecessary traffic. For that reason, it is much better to block bad bots and crawlers preferrably by fail2ban for the whole server, by an Nginx rule or simply by an .htaccess rewrite sequence. I'd do this one, but you are free to change the user agents mentioned to whatever you feel fit:

Code:
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} (PetalBot|UptimeRobot|seocompany|LieBaoFast|SEOkicks|Uptimebot|Cliqzbot|ssearch_bot|domaincrawler|AhrefsBot|spot|DigExt|Sogou|MegaIndex.ru|majestic12|80legs|SISTRIX|HTTrack|Semrush|MJ12|MJ12bot|MJ12Bot|Ezooms|CCBot|TalkTalk|Ahrefs|BLEXBot) [NC] 
RewriteRule .* - [F]
 
I cannot answer these questions yet, I need the team to respond first. It is very important to get such technical answers right.

But I can recommend that you insert this sequence at the beginning of your .htaccess file, regardless what other settings you have. Robots.txt is not helping at all, especially not with bad bots, and it is also not helping with many SEO testing engines that are creating unnecessary traffic. For that reason, it is much better to block bad bots and crawlers preferrably by fail2ban for the whole server, by an Nginx rule or simply by an .htaccess rewrite sequence. I'd do this one, but you are free to change the user agents mentioned to whatever you feel fit:

Code:
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} (PetalBot|UptimeRobot|seocompany|LieBaoFast|SEOkicks|Uptimebot|Cliqzbot|ssearch_bot|domaincrawler|AhrefsBot|spot|DigExt|Sogou|MegaIndex.ru|majestic12|80legs|SISTRIX|HTTrack|Semrush|MJ12|MJ12bot|MJ12Bot|Ezooms|CCBot|TalkTalk|Ahrefs|BLEXBot) [NC]
RewriteRule .* - [F]

Thanks Peter, I had this exact solution give or take from a google search but I couldnt tell if it was working or not or whether listing multiple bots like that in an array would work so I have them listed individually at the moment.

I'll wait to hear back regarding the official answer before I make any more changes. If the block bots just adds that to .htaccess then I can edit it after the option has been selected.
 
This is what the Nginx conf looks like when you activate the bot protection in the WP Toolkit:
Code:
# "Enable bot protection"
# To remove this rule, revert this security measure on each WordPress installation on this domain
if ($http_user_agent ~* "(?:acunetix|BLEXBot|domaincrawler\.com|LinkpadBot|MJ12bot/v|majestic12\.co\.uk|AhrefsBot|TwengaBot|SemrushBot|nikto|winhttp|Xenu\s+Link\s+Sleuth|Baiduspider|HTTrack|clshttp|harvest|extract|grab|miner|python-requests)") {
    return 403;
}
 
This is what the Nginx conf looks like when you activate the bot protection in the WP Toolkit:
Code:
# "Enable bot protection"
# To remove this rule, revert this security measure on each WordPress installation on this domain
if ($http_user_agent ~* "(?:acunetix|BLEXBot|domaincrawler\.com|LinkpadBot|MJ12bot/v|majestic12\.co\.uk|AhrefsBot|TwengaBot|SemrushBot|nikto|winhttp|Xenu\s+Link\s+Sleuth|Baiduspider|HTTrack|clshttp|harvest|extract|grab|miner|python-requests)") {
    return 403;
}
@maartenv @Peter Debik
Probably more important is:

Can you confirm that when I tick the Enable Bot Protection option - That Google, Bing, Yahoo, Duck Duck Go and other GOOD bots are not blocked and that only bad bots are blocked?

You can see why I wouldnt just blindly enable this as there is no explanation in plesk as to what it actually does.

It will save me a heap of time to be able to just go through and tick the box in Plesk rather than manualyl blocking bots.
 
Only the bots listed in this string are blocked:
acunetix | BLEXBot | domaincrawler\.com | LinkpadBot | MJ12bot/v | majestic12\.co\.uk | AhrefsBot | TwengaBot | SemrushBot | nikto | winhttp | Xenu\s+Link\s+Sleuth | Baiduspider | HTTrack | clshttp | harvest | extract | grab | miner |python-requests

Googlebot, Bingbot, DDG, and others good bots are safe and won't be blocked.
 
The official bot list is

acunetix
BLEXBot
domaincrawler.com
LinkpadBot
MJ12bot
majestic12.co.uk
AhrefsBot
TwengaBot
semrushBot
nikto
winhttp
Xenus Link Sleuth
Baiduspider
HTTrack
clshttp
harvest
extract
grab
miner
python-requests

The blocking is done by the Nginx configuration entry as described above.

Currently, the list is not user-configurable. There are plans to make it configurable in the future.
 
The official bot list is

acunetix
BLEXBot
domaincrawler.com
LinkpadBot
MJ12bot
majestic12.co.uk
AhrefsBot
TwengaBot
semrushBot
nikto
winhttp
Xenus Link Sleuth
Baiduspider
HTTrack
clshttp
harvest
extract
grab
miner
python-requests

The blocking is done by the Nginx configuration entry as described above.

Currently, the list is not user-configurable. There are plans to make it configurable in the future.
Thanks Peter,

I have enabled this for the site that was experiencing problems.
I am now waiting for a TTFB score update on Google page speed to see if this has resolved the problem.

Otherwise I will need to go back to the drawing board and see if I can find the cause of the issue.
 
@Peter Debik @maartenv
After enabling bot protection I am seeing a useragent called seek now browsing the site using 1-2gb worth of bandwidth.
(user-agent: seek)

I cannot find any information about this bot.

Have you ever encountered this user agent before? I cannot find any information about it on Google.
 
Never heard of a seek bot. Do you have the IP address of this bot?

Have you tried to add it to a robots.txt file in the document root of the website?
Code:
User-agent: seek
Disallow: /
 
I have added this to robots.txt already but it is ignoring it.

IP address - AW Web Stats does not record the IP address of specific bots. I do not have the IP address for the bot.

Any suggestions?
 
I have added this to robots.txt already but it is ignoring it.

IP address - AW Web Stats does not record the IP address of specific bots. I do not have the IP address for the bot.

Any suggestions?

Can you check the access_log and access_ssl_log for this bot?

/var/www/vhosts/system/domain.com/logs/access_log
/var/www/vhosts/system/domain.com/logs/access_ssl_log
 
Looking at the log files I can see a number of entries from SeekportBot.

I think this is it. it is a german search engine.
I am going to block it to see if it resolves the issue.

Any advice on the best way to block this in conjunction with the Block Bad Bots tool in WordPress toolkit?

I do have the default Plesk Apache-Badbots jail turned on but I do not think it is actually working.

I have added the following to the htaccess file for the site:

# Block via User Agent
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (seek|SeekportBot) [NC]
RewriteRule (.*) - [F,L]
</IfModule>
 
The official bot list is

acunetix
BLEXBot
domaincrawler.com
LinkpadBot
MJ12bot
majestic12.co.uk
AhrefsBot
TwengaBot
semrushBot
nikto
winhttp
Xenus Link Sleuth
Baiduspider
HTTrack
clshttp
harvest
extract
grab
miner
python-requests

The blocking is done by the Nginx configuration entry as described above.

Currently, the list is not user-configurable. There are plans to make it configurable in the future.
I'd really appreciate the ability to allow the Xenu on my sites since that's my preferred broken link checker tool. Currently have to disable "bot protection" temporarily when I want to scan my managed sites.
 
I just want to confirm that if you plan on using ahrefs you cannot use wp-toolkits bot protection security measure because it includes their main bot.

This has caught us out today. We went through all of our security points before we realised ahrefs was on this list!
 
Our current list in htaccess includes:
# Block via User Agent
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(Ahrefs|AhrefsBot|meta-externalagent|acunetix|BLEXBot|domaincrawler\.com|LinkpadBot|MJ12bot/v|majestic12\.co\.uk|TwengaBot|SemrushBot|nikto|winhttp|Xenu\s+Link\s+Sleuth|Baiduspider|HTTrack|clshttp|harvest|extract|grab|miner|python-requests|ALittleClient|BitSightBot|gptbot|SEOKicks|MJ12bot|PetalBot|AspiegelBot|MauiBot|DotBot|ClaudeBot|DataForSeoBot|Bytedance|Bytespider|Barkrowler|SeznamBot|thesis-research-bot|my-tiny-bot|fidget-spinner-bot|AwarioBot|AwarioSmartBot|AwarioRssBot|seek|SeekportBot|spider|YisouSpider|360Spider).*$ [NC]
RewriteRule .* - [F,L]
</IfModule>
 
Back
Top