Another fail2ban Issue (FilterPyinotify)

Ripshod · Jan 27, 2016

I've been trying to help a member here with his fail2ban installation and after following advice from a more knowledgeable member I changed my own settings yesterday so fail2ban would use a disc based database, where previously I had it stored in memory.

I woke this morning to a 25+GB logfile and a cpu core running 100% constantly. Only way I could get control of my server was to kill fail2ban and fail2ban-client, and kill -9 on the fail2ban-service.

The logfile is full of 'fail2ban.filterpyinotify[6756]: ERROR Error in FilterPyinotify callback: disk I/O error'. When I say full I mean more than 1000 entries per second. Permissions on the database directory and file are 600.

For now I've gone back to using the 'memory' setting without any problems. I originally started using the memory setting because of another disc I/O problam I had when I first ever used it on this server. This error is similar - yet different.

14.04.3 LTS
Plesk 12.5.30 MU20

trialotto · Jan 27, 2016

@Rhipsod,

The problems you encounter are (essentially) a lack of preventive measures against bad IPs and bad traffic originating from those IPs, with the symptoms being resource overusage.

Please use the firewall to prevent that your log files, that are scanned by Fail2Ban, are filled up with a lot of log lines.

At this moment, there certainly is an issue with (Plesk´s default) Fail2Ban setup, I am investigating it to make a "quick" (what is in a word) improvement.

Again, note that a memory based solution is only shifting the root cause of the problem (i.e. many "bad requests") to memory, implying that this root cause is not resolved and, as such, you can also expect to get a memory related issue sooner or later.

Let´s assume that you maintain the ":memory:" setting and, given this assumption, one can do the following work-around to mitigate the symptoms of disk or memory overusage:

a) deactivate the ssh jail AND block all traffic to port 22 via the Plesk firewall (or iptables), ONLY allowing your own IP, (and)

b) for existing jails:

- decrease the value of maxretry to, for example, 2 or 3 (if the default or current value is 5)
- increase the ban period to a higher value

c) for advanced tuning, create a custom jail and add some specific settings, including settings for the Fail2Ban findtime variable: follow these steps (in chronological order)

1 - create a custom jail, go to "Tools & Settings > IP Address Banning > Jails" and click "Add Jail" and copy the settings from the recidive jail, AND
2 - apply an increase of findtime in the [ Default ] part of the /etc/fail2ban/jail.local file: this will reduce workloads, AND
3 - apply a findtime value of 86400 (or less) to the custom jail section (in the bottom part of the /etc/fail2ban/jail.local file): this will cause Fail2Ban to be strict, AND
4 - set the bantime value to a whopping 31536000 (or at least twice the maximum bantime applied in all the jails, i.e. 1209600): this will ban bad IPs for one year, AND
5 - set the maxretry value to 2: this will allow Fail2Ban to ban an IP that has been banned twice by any other jail (!), AND
6 - run the command: service fail2ban reload (and, in the exceptional that this command does not work, run: fail2ban-client reload), AND
7 - disable recidive jail via the Plesk Panel.

To illustrate: an example of the custom jail section in the /etc/fail2ban/jail.local can be

[badip-jail]
enabled = true # note: it is activated by default, after creation
filter = recidive #you can also choose to apply the custom jail to any other filter, in which case you simply have to select the appropriate filter
action =iptables-allports[name="badip"] #note: the "name" can be anything, as long as the name is not too long, keep it short!
logpath =/var/log/fail2ban.log
findtime = 1800 #Fail2Ban is very strict: it is checked whether the total of bans (for a specific IP) in a period of 1800 seconds (30 minutes) is equal to the value of maxretry
bantime = 31536000 #if maxretry values are exceeded, the IP is banned for one year
maxretry = 2 #the value of 1 will also work, but it is not adviced

Note that we are

- adding the custom jail, in order to create a jail.local file (i.e. a config file that applies and/or overrides jail.conf) that we can edit safely,
- creating the jail.local file, in order to allow our customizations to be rather persistent during Plesk and/or Fail2Ban updates, (note that persistence is not likely when editing jail.conf)

and finally note that we are

- using the fail2ban.log to identify and (almost permanently) block specific bad IPs, with the implementation of a strict banning policy,
- maintaining the existing structure of jails, allowing you to add specific IPs to the whitelist, in order to prevent "false negatives" (i.e. good IPs are incorrectly marked as bad),
- allowing the custom jail to implement the strict banning policy, i.e. ban for a longer period (one year for example),
- reducing the work of all "normal jails", given the fact that (long-term) banned IPs will not or will barely cause output in the log files,
- also allowing the custom jail to identify and consequently ban bad IPs that are not identified or not identified properly by all of the "normal jails",

and so on.

One important remark has to be made though: Fail2Ban is not that "intelligent", in the sense that it is (essentially) a log scanner.

Therefore, it can be the case that "normal jails" still can result in considerable resource (disk or memory) usage: for this reason, the maxretry values for all "normal" (existing) jails should be relatively low (maxretry = 2 or 3), in order to make sure that any bad IP will enter the fail2ban.log as soon as possible.

By the way, do not set the maxretry value for the apache jail too low: this can result in your sites to be inaccessible for regular and trusted customers or visitors.

d) for very advanced tuning, a custom action can be created: I am working on that, since that would be most effective.

Finally, note that I did not test all of the above, I just "borrowed from past experience".

Any feedback or test results would therefore be much appreciated.

Regards......

Ripshod · Jan 28, 2016

It's a disc error.

I've just noticed that with the dbfile = memory setting fail2ban is actually creating a database file called 'memory' in the /root folder.

So:
dbfile = /var/lib/fail2ban/fail2ban.sqlite3 = disc I/O errors
dbfile = memory = no errors. (but not in memory, in file /root/memory)

But it works fine with the second config. It can't really get any stranger can it?

trialotto · Jan 28, 2016

@Rhipsod,

The "disc I/O errors" are very likely to be problems related to sqlite and, in most cases, connection failures.

By the way, did you actually use :memory: as value? If not, that could explain why you end up with a file in the root folder.

Also, are you aware that you can prevent memory related problems by adding the line "ulimit -s 256" (or 512) to the bottom of the /etc/default/fail2ban file?

Regards....

G J Piper · Jan 28, 2016

I had an issue like this, but what I found was Plesk's default log file management was letting files get as large as 100+mb in file size before rotating them. I had a hunch Fail2Bad's scanning was getting bogged down, so I added a setting to /etc/logrotate.conf and haven't had a peep from Fail2Bad since. (and it still works)

Added to /etc/logrotate.conf near the beginning:
maxsize 5M

trialotto · Jan 28, 2016

@G J Piper,

Thanks for the addition, it is valuable information for everyone, with respect to both Fail2Ban as general issues related to logrotation and/or log files.

However, there is one major drawback in your approach: by default, Fail2Ban can or will not scan zipped files (that can be resolved though).

The above implies that some IPs can pass the Fail2Ban checks, for instance if the number of "access attempts" is below the maxretry value within the period in which logrotation is absent.

The irony is that log rotation is desirable in the log dimension, whilst logrotation is not desirable in the area of Fail2Ban: logrotation has a (very) negative effect on Fail2Ban, since it can or will cause that lines concerning bad IPs will be removed from the logs scanned by Fail2Ban and, as a result, the identifiable number of retries present in the logs will decrease.

A similar problem and/or area of attention exists when considering the impact of logrotation of Fail2Ban: if log rotation is set to daily, one should not have a Fail2Ban findtime value that exceeds 86400 seconds (i.e. one day), otherwise Fail2Ban will examine (effectively) a partial log and the identifiable number of retries in the logs will (again) decrease.

In short, simply changing logrotation settings is not a full, but only a partial solution to the issue of resource overusage by Fail2Ban.

Note that hackers nowadays will try to aim at Fail2Ban secured servers, since the presence of Fail2Ban often indicates the absence of a proper firewall and bypassing of Fail2Ban is easy.

Hope the above helps!

Regards....

G J Piper · Jan 28, 2016

I understand what you are saying. You could easily run it at about 10mb if needed... even a 5mb file size is over 35,000 lines in a log, and for my maillog that is over 12 hours worth. For me it works great.

I guess I figure if the bots can break my passwords after only 20 attempts in a few hours even, then I've got different problems. If my logs rotate even every couple hours, they still can't break my passwords with a brute-force attempt that slowly implemented.

trialotto · Jan 28, 2016

@G J Piper

Ehm, that is what I am talking about: "even a 5mb file size is over 35,000 lines in a log, and for my maillog that is over 12 hours worth".

This line suggests that you have some normal login attempts AND some attempts, originating from bad IPs, which attempts are in an endless cycle of banning, unbanning by Fail2Ban.

The before mentioned cycle of banning and unbanning increases the length of your log, since temporarily banned bad IPs are sooner or later allowed to try again.

Your solution: introduce logrotation and decrease log size, with an associated advantage of (somewhat) better performance by Fail2Ban.

The solution that I personally would implement: a more strict banning policy by Fail2Ban, with an associated advantage of smaller log files (and no need to adjust logrotation settings).

In essence, both solutions can do (more or less) the same, with one tiny but relevant difference: your primary goal is logrotation improvements, my primary goal is Fail2Ban improvement.

Note that your solution has a higher chance on "false positives", i.e. bad IPs are incorrectly marked as good (for at least some time).

Regards.....

G J Piper · Jan 28, 2016

I guess I'm way more interested in avoiding marking actual (password-forgetful) customers as banned, so I give bad bots 20 attempts before they get on the banned list. This usually happens in around a 20 minute period of time or less. Since my log rotation only happens every 4 hours or so this will never be a problem on my server. I also use the "recidive" jail which stops the repeat offenders well. My goal was originally Fail2Ban efficiency improvements, but when I saw a 200MB log file in there I knew I had to rotate my logs better anyway so it just fit.

trialotto · Jan 28, 2016

G J Piper said:
I guess I'm way more interested in avoiding marking actual (password-forgetful) customers as banned, so I give bad bots 20 attempts before they get on the banned list. This usually happens in around a 20 minute period of time or less. Since my log rotation only happens every 4 hours or so this will never be a problem on my server. I also use the "recidive" jail which stops the repeat offenders well. My goal was originally Fail2Ban efficiency improvements, but when I saw a 200MB log file in there I knew I had to rotate my logs better anyway so it just fit.

Note that the huge log file is an indication of (EITHER) a huge lot of password-forgetting customers (OR) many hack attacks of the distributed, brute forcing kind, with those hack attacks not being identified as such by Fail2Ban.

The above is not surprising, when taking into consideration the default Fail2Ban settings and structure, as provided with Plesk.

Actually, let´s illustrate: a findtime value of 600 seconds, which is the default value, will allow you password-forgetting customers to (re)try up to the maxretry value in every period of 10 minutes. In a time window of 20 minutes of "forgetting passwords", your password-forgetting customers actually get twice the number of maxretries. In short, increasing the max retry value will allow password-forgetting customers AND bad scripts to try over and over again, hence also (considerably) increasing the number of log lines. In the case of password-forgetting customers, this is increase is rather limited (i.e. sooner or later they remember). However, the bad scripts can attempt up to twice the number of maxretries in a period of 10 minutes, without getting banned by Fail2Ban, implying that the (considerable) increase in log lines is primarily due to bad scripts and approximately equal to twice the number of maxretries.

In short, the illustation is a good example of how Fail2Ban fails to perform well, as a result of an unnecessary large log file.

By the way, note that the recidive jail does not change the above at all: the recidive jail only scans fail2ban.log (i.e. banned IPs) and, in this illustration, fail2ban.log will be relatively empty.

The above implies that repeat offenders are not "stopped" at all, they simply "bypass" Fail2Ban as a result of "bad" (what is in a word) Fail2Ban configuration.

The latter statement immediately illustrates why certain attack scripts are aiming at Fail2Ban secured servers.

Finally, note that your relatively short interval for logrotation (4 hours) will aggravate the whole problem:

- most of the bad IPs are absent in the logs, scanned by Fail2Ban, and are passed to the logrotated logs, (and)
- recidive jail, which has a default findtime of 86400 seconds (one day), becomes (essentially) ineffective.

Hope the above helps, I would certainly advice to adjust some of the Fail2Ban settings.

Regards.....

G J Piper · Jan 28, 2016

I'm not having any problem at all with Fail2Ban's log file, nor was there any. It is the server logs that were being scanned that were too large. No setting in Fail2Ban is going to change that if I'm not mistaken.

trialotto · Jan 29, 2016

@G J Piper

The thing you are mistaken about is the following: you settings for logrotation and Fail2Ban will result in Fail2Ban not protecting your server (and I have explained why).

Simply stated: you decrease the amount of information in the logs, Fail2Ban misses a lot of information and your systems are (essentially) a security leak.

I am not here to tell you what to do, you can choose every setting to your own liking, but be aware of the significant drawbacks of your choices.

Regards.....

G J Piper · Jan 29, 2016

I guess I'm not understanding what the setting "Time interval for detection of subsequent attacks" does, then, in the Fail2Ban Settings... Isn't this telling Fail2Ban to look back only a certain amount of time in the logs to count failed attempts?

(By the way... calling my system a "security leak" is a little much isn't it, since millions of servers do just fine without running Fail2Ban.)

trialotto · Jan 29, 2016

@G J Piper

G J Piper said:
Isn't this telling Fail2Ban to look back only a certain amount of time in the logs to count failed attempts?

Correct, in a sense.

Consider the recidive jail, scanning a period of 86400 seconds, i.e. one day.

Consider your logrotation settings, with logrotation every 4 hours: log lines are (essentially) removed every 4 hours.

Fail2Ban runs a "task" for the recidive jail and, at any given period in time, the logs scanned by Fail2Ban are only containing for (at most) 4 hours worth of log lines.

In short, in the case of the recidive jail, Fail2Ban wants to check for 24 hours of log lines, but can only check (at most) 4 hours worth of log lines (there is nothing more to check).

The above is rather simplified (i.e. it is actually somewhat more complex), but you get the picture: Fail2Ban does not AND cannot work optimally.

Regards....

G J Piper · Jan 29, 2016

You must be misunderstanding... my logs don't all rotate every 4 hours. They are set to rotate whenever they get to 5mb in size. (see my original post)

It just so happens that my most active log (access_log) rotates every 4 hours on average because that is when it reaches 5mb, but Fail2Ban doesn't need hours of that log anyway.

Using the 5mb log rotation limit, my logs rotate in varying time-frames. The Fail2Ban log rotates about every 2 days at that size limit, my maillog rotates about every 12 hours at 5mb, so I'm perfectly good to go and have no downside that I can see.

trialotto · Jan 29, 2016

@G J Piper

Really, never mind.

If you are happy with your setup and are actually in the supposition that you are secure, well, that is fine.

Another fail2ban Issue (FilterPyinotify)

Ripshod

Basic Pleskian

trialotto

Golden Pleskian

Ripshod

Basic Pleskian

trialotto

Golden Pleskian

G J Piper

Regular Pleskian

trialotto

Golden Pleskian

G J Piper

Regular Pleskian

trialotto

Golden Pleskian

G J Piper

Regular Pleskian

trialotto

Golden Pleskian

G J Piper

Regular Pleskian

trialotto

Golden Pleskian

G J Piper

Regular Pleskian

trialotto

Golden Pleskian

G J Piper

Regular Pleskian

trialotto

Golden Pleskian

Similar threads