• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Question Exempting Facebook crawlers from redirect

cornishpirate

New Pleskian
My host recently upgraded me from Plesk 11 to 12.5 - and what a fantastic improvement for my PHP based ecommerce site. However, I am a novice at Nginx.

I am poised to move my site to https, but Facebook is a stumbling block! They cannot deal with it automatically and there is a danger of us losing our almost 90K likes.

I would like to be able to use

Redirect permanent / https://www.mysite.co.uk/

in additional Nginx directives for HTTP.

Part of Facebook's advice to circumvent their own limitations is to exempt their crawler from the redirect as in
https://developers.facebook.com/docs/sharing/webmasters/crawler

Their crawlers are identified as one of these:

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
facebookexternalhit/1.1
Facebot

My question is how to structure the additional Nginx directives to cope with this.

I'd be very grateful for any advice.
 
@cornishpirate

If you only want to use a redirect to https, then you can do the following:

a) go to "Domains > [ domain ] > Apache & Nginx settings (click) > Additional Nginx directives (enter text below)"

b) add the custom Nginx config:

if ($scheme != "https") {
return 301 https://www.domain.tld$request_uri;
}

and replace www.domain.tld with the proper domain name.

Note that "if is evil" in Nginx terms, but due to the default Nginx configuration (shipped with Plesk), this "if statement" is necessary.

c) press OK and that is it.

Note that your ecommerce site can require a lot of other Nginx tweaks, depending on the ecommerce platform you are using.

Regards.......

PS If you really want to block the Facebook crawlers, which is not a good idea if you want to keep the 90K followers, you would do best by using Fail2Ban to block the crawlers, as opposed to create a Nginx directive that takes care of blocking those crawlers. Another option is to add the Facebook IPs to the firewall. Again, you should want to keep the followers!
 
This way also works...
Code:
if ($scheme = http) {
    return 301 https://www.domain.tld$request_uri;
}

I'm not sure which is "quicker" though.
Regards

Lloyd
 
@Lloyd_mcse

In response to

I'm not sure which is "quicker" though.

the answer: in theory, it does not matter.

In practice, the following applies:

- the "!=" syntax is slightly better, since it allows Nginx to find a match and execute (read: all other pattern matching actions are discarded, Nginx simply proceeds with the return directive)
- there are some minimal differences in execution time between specific Nginx versions: later versions have some "disgust" for "if statements",
- in the end, all depends on the browser (or any other client for that matter), handling the "301 responses"

and note that Nginx is not rather picky when it comes to definitions (read: it is quite "intelligent", in the sense that it reads the "intention" of the pattern match), which makes the "=" (as opposed to "!=") a little bit more dangerous: a good rule of thumb is to use explicit directives and/or unambiguous "if statements" in Nginx.

That´s all.....

Regards
 
"If you really want to block the Facebook crawlers"

No, I don't want to block them, I want to exclude them from the redirect to https.
 
Back
Top