Upgrading from 12.0.18 -> 12.5.30 problems, plus fail2ban and ongoing DDoS!

Oliver Bazely · Feb 16, 2016

Hi,
I have a VPS with a handful of sites on. Nothing that critical. Plesk has been running fine for several years. I recently started getting CPU / Apache memory errors. I found that one of my sites has been under DDoS attack for about a week (only 1 request per second, but enough to take that site offline).

I wanted to install fail2ban and mod_security, to shut down the offending IPs. I tried to do this via plesk 12.0.18, using the upgrade / update tool, and selecting the packages there. mod_security installed fine, but fail2ban was very slow, and never really worked. I restarted the server, and started getting error messages on the plesk dashboard. I tried upgrading to 12.5.30, to see if that would get fail2ban working, but I still had problems.

Since then, plesk has completely stopped working. I can't even log in any more. I tried some things via terminal, including yum update, and the /bootstrapper.sh repair (for both 12.0.18 and 12.5.30). I now get errors for both.

I'd really like to get back into the plesk control panel, and work out how to get fail2ban working. The DDoS is still going on, and the site keeps being knocked off line. I have the IPs to be blocked.

Some info:

Code:

[root@XXXXXXXX bootstrapper]# plesk version
Product version: 12.0.18
     Build date: 2015/10/14 14:00
   Build target: CentOS 6
       Revision: 333059
   Architecture: 64-bit
Wrapper version: 1.1
[root@XXXXXXXXXXX bootstrapper]# mysql -u admin -p`cat /etc/psa/.psa.shadow` -e 'select * from psa.misc where param="version"'
+---------+-----------+
| param   | val       |
+---------+-----------+
| version | 012005030 |
+---------+-----------+
[root@XXXXXXXXX bootstrapper]# /usr/local/psa/bootstrapper/pp12.0.18-bootstrapper/bootstrapper.sh repair
Started bootstrapper repair procedure. This may take a while.
Certain actions may be skipped if not applicable.


**** Product repair started.

===> Checking for previous installation ... found.
Started bootstrapper repair procedure. This may take a while.
Certain actions may be skipped if not applicable.

Trying to start service mysqld... mysqld (pid  1706) is running...
done
Trying to establish test connection... connected
done
Trying to start service mysqld... mysqld (pid  1706) is running...
done
Trying to establish test connection... connected
done
Trying to find psa database... version is 012005030
DATABASE ERROR!!!
Previous product version is 12.0.18, but previous database
version is 012005030. In most of cases it is result of
previous upgrade try failure. Please, restore previous version
from backup, and try again or contact technical support.

ERROR while trying to check database version
Check the error reason(see log file: /var/log/plesk/install/plesk_12.0.18_repair.log), fix and try again

*****  problem report *****
ERROR while trying to check database version
Check the error reason(see log file: /var/log/plesk/install/plesk_12.0.18_repair.log), fix and try again
[root@XXXXXXXX bootstrapper]#

Oliver Bazely · Feb 16, 2016

This is the end of the 12.5.30 boostrapper.sh output:

Code:

Bootstrapper repair finished.
Errors occurred while performing the following actions: fix credentials for PhpMyAdmin database, upgrade Plesk business logic, cumulative Plesk upgrade and repair final stage, configure SSL ciphers and protocols.
Check '/var/log/plesk/install/plesk_12.5.30_repair.log' and '/var/log/plesk/install/plesk_12.5.30_repair_problems.log' for details.
If you can't resolve the issue on your own, please address Parallels support.

Oliver Bazely · Feb 16, 2016

And when I navigate to the plesk interface in a browser, i see:

Code:

The file /usr/local/psa/admin/htdocs/index.php is part of Plesk distribution. It cannot be run outside of Plesk environment.

Any help would be gratefully received. Extra bonus points for stopping this damn DDoS!!

IgorG · Feb 16, 2016

Oliver Bazely said:
Previous product version is 12.0.18, but previous database

Try to fix it with help of this KB article - https://kb.odin.com/en/124787
Ask assistance of Plesk Support Team if it not help.

Oliver Bazely · Feb 17, 2016

Thanks for replying Igor. That article contains some useful pointers, but the bootstrapper repair script doesn't work for either version, and the database is ahead of the client, not behind. How can I find out why the 12.5.30 upgrade script is failing? It seems more sensible to get the client upgraded to the same version as the database. I don't have a backup of the plesk database at version 12.0.18

Dave_G · Feb 17, 2016

Hi Oliver,

We had something similar a few days ago and had to restore the database completely. Plesk takes a backup of all mysql databases every night which can be found in /var/lib/psa/dumps; hopefully this would get everything working again.

Dave.

Oliver Bazely · Feb 18, 2016

Thanks Dave. I saw in the logs that there were some db dumps floating around. I have a ticket open with my hosting provider, so hopefully they will give me some additional pointers. Otherwise, i'll try to rollback the db myself. This does't seem like it will get fail2ban working though, and the DDoS is still going on. Very frustrating. Thanks again. Oliver

Dave_G · Feb 18, 2016

You could try looking at the fail2ban conf file and see if it is set to use sqlite (we noticed some months ago fail2ban was dragging the server down and found it was writing everything to sqlite). We then went through all of our servers and change the conf file from:
dbfile = /var/lib/fail2ban/fail2ban.sqlite3
to
dbfile = None

After restarting fail2ban, the servers load reduced to hardly nothing and fail2ban was working fine.

trialotto · Feb 18, 2016

@Oliver Bazely

In common practice, a DDoS attack can be often tackled best by doing a number of things like:

- switching to another IP immediately (this is often enough to stop the attack, since that is targeted at the old IP),
- blocking a lot of IPs (i.e. IP ranges) in the firewall (i.e. a direct block, without Fail2Ban),

and so on.

Your issue with Fail2Ban is closely related to all other problems encountered. For instance, Fail2Ban using a lot of resources can cause your system to perform (very) badly.

The challenge now is (on the one hand) reduce damage done and (on the other hand) prevent similar situations to occur in the future.

Both parts of the challenge can be addressed by having a look at Fail2Ban configuration:

a) as a temporary solution, you can use the solution provided by @Dave_G (see post 8), but this is not a permanent solution.

In essence, the dbfile=none is not quite an option, since it often occurs that you have to reboot when under attack: with dbfile=none, all banned IPs are not persistent across reboots, allowing your attackers to retry in full after you have restarted the server.

Another option, that is not quite an option, is the dbfile=":memory:" setting: this can shut your entire machine down, when under attack. After all, memory can get depleted.

The points below aim to form the "more permanent" solution, using Fail2Ban functionality.

b) change all jails, in the sense that

- the number of failed log attempts (i.e. max retry value) is decreased: often a value of 2 to 3 is good
- the ban period is increased: a higher value of 7 days during attack is not uncommon, while a value of 24 hours is common under normal circumstances

and, as a result, the IP is blocked faster and longer, reducing the number of log lines related to the IP (and hence the work of Fail2Ban and the size of the Fail2Ban dbase).

c) investigate your fail2ban logs and adjust jails if needed.

It often is the case that attack scripts target at Fail2Ban design structure or Fail2Ban setting failures.

For instance, mailservers are often attacked in a set of specific intervals, that will not lead to banning by Fail2Ban.

Anyway, you can detect these "smart scripts" (what is in a name?) by comparing the numbers of the words "found" and "ban": most "smart scripts" will lead to a lot of log lines with "found" in it, but will often not lead to a log line containing a "ban", meaning that Fail2Ban fails to ban the IP.

d) add some custom actions: one can create actions that allow "bad" IPs (i.e. repeated offenders) to be blocked during a very long period (for instance, a year).

Any custom action of the above kind will in essence be (almost) equivalent to a IP ban via a firewall (i.e. a permanent IP ban).

This way you still have the advantage of Fail2Ban identifying the bad IPs and automating the addition to the firewall rules (i.e. iptables).

e) decrease the "scan interval" for Fail2Ban: this is some advanced setting, for which some manual steps have to be taking. For the sake of clarity, I will not discuss that now.

In short, simply by changing some Fail2Ban settings, one can improve security quite a lot.

In general, one should not rely on Fail2Ban "as is", one should always keep track of what Fail2Ban does and, if necessary, adjust settings, jails, filters, actions and so on.

Moreover, Fail2Ban is more or less a last line of defense: a proper firewall should come first and one can always consider more elaborate solutions, like proxies and "ban clusters".

Hope the above helps a little bit!

Regards....

Oliver Bazely · Feb 18, 2016

Wow - thanks for the great reply. That all makes a lot of sense, and is a good overview of how to deal with a DDoS.

I would have switched IP if I had another IP to hand. As I mentioned, this is just my sandbox server, that I used for a few local community / experimental sites.
On our main system we have a hardware firewall with proper IDS / IPS, as well as Amazon Route53 load balancing.

I actually setup Cloudflare for this site, and initiated the 'under attack' mode, until I realised that it does nothing to prevent IP-based attacks.

I'll continue to report back as I wrestle with this one.

Thanks again.

trialotto · Feb 18, 2016

@Oliver Bazely

Note that a hardware firewall is a "first line of defense" and that a software based firewall can be added: it can do no harm.

With respect to your remark about Cloudflare, I must admit that this is a bad choice: Cloudflare itself is often under attack, with delays in request as a consequence.

Personally, I feel that using Route53 is a strange (and costly) choice, when taking into consideration that Plesk comes with (free) Nginx, that can properly function as a very good Load Balancer, certainly when having multiple Plesk instances (i.e. this prevents that you have to purchase another server, to be a node in a Nginx cluster or a set of Nginx servers).

However, it is not about your personal choices, it is about advice concerning "what to do best".

In general, you should keep the DNS servers separate from the web servers.

Moreover, the use of proxy servers, like Nginx, does allow you to block bad traffic before it is passed to your web servers.

Each of the web servers can have a firewall, the functionality of which can be augmented by tools like Fail2Ban.

In most cases, "public networks" like Cloudflare are often not needed or can even become counterproductive.

A simple setup of a Plesk + Apache + Nginx stack on multiple servers can function, with some basic custom configuration of Nginx, as a load-balanced, high-availability server structure, with some degree of fail-over.

The big advantage of that simple setup is that a "no worries" approach can be used: just use default stacks, apply some minimal customization and clone to other servers.

In order to allow for scaling, automatic scaling, private networks, internal networks, non-fixed (i.e. virtual) IPs, one should consider the cloud (instead of implementing cloud stacks yourself) and also use the above simple setup, this time on multiple virtual machines (and these can be very easily cloned or scaled).

In short, everything is possible, but simple is often best: makes it rather easy to prevent or mitigate attacks.

Regards....

Oliver Bazely · Feb 18, 2016

Again... good advice.

The use of Route53 was not necessarily related to preventing DDoS. I was more pointing to my current scenario being one of limited resources, compared to my enterprise / day-job setup. We use Route53 as it has the ability to serve static content from an S3 bucket (ie a branded holding page - aka fail whale) if anything goes wrong with the server or intermediate network (eg data centre switches)

It is interesting to hear that CloudFlare is itself a target. I'll take this into consideration before deploying it long term.

trialotto · Feb 18, 2016

@Oliver Bazely

I would strongly recommend to have a look at the functionality of Nginx.

Nginx is a lightweight webserver AND/OR proxy (reverse proxy, even though a function as a forward proxy is also possible).

Serving files with Nginx is not limited to static content (including error pages, or your "branded holding page"), Nginx can also run and serve scripts (perl by default and, with some modules, all kinds of other programming languages are possible, with lua scripting being the most interesting in the context of Nginx).

Moreover, Nginx can cache requests (with some minor tweaking, naturally) or serve requests from memory (dynamic cache, as opposed to static caching).

In essence, Nginx is powerful in all dimensions.

The irony is that Cloudflare (and almost all other CDNs or similar things) is in essence a "bunch of Nginx servers with a specific configuration".

The above also implies that you would be better off by creating your own "Nginx set" with custom configuration, that suits your own specifications and needs, as opposed to using the more general Cloudflare Nginx based servers, with "a one size fits all" approach to configuration (with some penalty on flexible implementation for your own applications).

For me, Nginx is just a very simple, effective AND costeffective solution to many challenges (not problems).

Regards....

danami · Feb 18, 2016

@Oliver Bazely What kind of VM are you running in KVM or Virtuozzo/OpenVZ?

Oliver Bazely · Feb 18, 2016

@danami I'm not sure, as it was just a quick off-the-shelf VM that I bought a few years ago. Can I tell from within the VM what type of VM it is, or would this only be available via the hypervisor?

I've heard back from my hosting provider. They suggest rolling back the database. They want to charge me £75 for them to open an Odin ticket.

danami · Feb 18, 2016

To see if you are in Virtuosso (if the file exists then you are in Virtuozzo): cat /proc/user_beancounters
To see if you are in KVM (if you get output then you are in KVM): dmesg | grep kvm

Oliver Bazely · Feb 18, 2016

Looks like I am in KVM.

Code:

[root@XXXXXX ~]# cat /proc/user_beancounters
cat: /proc/user_beancounters: No such file or directory
[root@XXXXXXX ~]# dmesg | grep kvm
kvm-clock: Using msrs 4b564d01 and 4b564d00
kvm-clock: cpu 0, msr 0:1c35941, boot clock
kvm-clock: cpu 0, msr 0:2215941, primary cpu clock
kvm-stealtime: cpu 0, msr 220d900
Switching to clocksource kvm-clock
[root@XXXXXX ~]#

danami · Feb 18, 2016

KVM does support IPsets so you are in luck. After you get your database issues sorted take a look at my signature (shameless plug) to deal with your DDos and Fail2ban issues.

Oliver Bazely · Feb 18, 2016

ok - thanks. Sounds promising. I'll follow this up tomorrow. I can't stare at this screen any more today!

trialotto · Feb 18, 2016

@danami,

Indeed, no need to argue: a shameless plugin.

With all due respect: some shame is appropriate, the issues will not be resolved by using ipsets.

Regards.....

Upgrading from 12.0.18 -> 12.5.30 problems, plus fail2ban and ongoing DDoS!

New Pleskian

New Pleskian

New Pleskian

Plesk addicted!

New Pleskian

New Pleskian

New Pleskian

New Pleskian

Golden Pleskian

New Pleskian

Golden Pleskian

New Pleskian

Golden Pleskian

Silver Pleskian

New Pleskian

Silver Pleskian

New Pleskian

Silver Pleskian

New Pleskian

Golden Pleskian

Similar threads