• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Shared Hosting Sites Crawling Each Other

Daphne

New Pleskian
Hi,

This is a bit odd and a bit convoluted. My hosting company says no worries but I should check the blogs. So here I am.

It seems my websites are somehow crawling each other and they shouldn't be.
I have a shared hosting server using PLESK running with Apache.
90% of my sites are using Wordpress.
All Wordpress sites have the Wordfence Premium Plugin.
Wordfence shows that my sites are being crawled every 15 minutes by my hosting server. No traffic should come from my server.
Here is the confusing part:

Domain1.com's Wordfence logs show Domain2.com crawling it, about 10 pages in seconds, stopping and then doing it again every 15 minutes.

Domain2.com's PLESK Access logs show that at that exact same time as the Wordfence logs, IP xx.xxx.xxx.123 (just an example) having been blocked by Apache from bad behaviors.

Domain2.com's Wordfence logs show Domain3.com crawling it, about 10 pages in seconds, stopping and then doing it again every 15 minutes.

Domain3.com's PLESK Access logs show at that exact same time as the Wordfence logs, IP xx.xxx.xxx.456 (just an example) having been blocked by Apache from bad behaviors.

This daisy chain of events cycles through all 67 domains on my shared IP. I just noticed this last week. All sites are have Wordpress and plugins up-to-date. I have another shared IP and the same thing is happening on it as well. This morning I noticed one non Wordpress site I host that showed up in the Wordfence logs.

The shared hosting situation will stay as is. Other than that has anyone seen this? Is there a concern here and if so how do I stop it?

Thanks,
Daphne
 
@Daphne,

Well, let´s make some structure in your elaborate story.

First, it is not uncommon to be crawled every 15 minutes, this action is taken by regular (and bad) crawlers, unless you have ruled out that bots crawl your site(s).

Note that most legit bots do follow the instructions in robots.txt files (and such alike). Other bots should be simply blocked by IP, via the firewall.

Second, some (or must I say "most") bots do follow links to some extent: this can imply a correlation between crawling of different sites.

Note that you can prevent this type of behaviour in the robots.txt files and/or the html tags.

Third, WordPress is notorious for being bad at many things, amongst others "being the target for attacks and bots".

Fourth, WordPress plugins are notorious for being badly designed, coded and even some commonly used plugins are known to do the exact opposite of what they promise.

Note that Wordfence is a very strange trade-off between a virus scanner, a malware scanner, a bot scanner and anything else that you can think of.

This trade-off has not been resulting in the best solution one can think of: Wordfence is probably one of the most active "crawlers" (what is in a word).

Moreover, Wordfence is known to exhibit strange behaviour under specific circumstances of shared or VPS hosting: one plugin on one WP instance has effect on ALL sites.

In general, you can test this by deactivating all Wordfence plugins and have a look at the logs.

A final remark about Wordfence: it is supposed to be "aligned" with Google, but this actually has a drawback, in the sense that googlebot has a lot of freedom to crawl sites. And this even becomes worse when google is generating search pages on those sites.


In summary and conclusion, try to deactivate the Wordfence plugin, have a look at the logs AND improve the robots.txt files (and where necessary, block some IPs).

That will certainly improve the situation.

Hope the above helps a bit.

Regards....
 
Trialotto,

Thank you for everything you shared. I will certainly go back and re-evaluate what to do.

Sincerely,
Daphne
 
Back
Top