Forwarded to devs Event 'cp_user_login_failed'

DataPacket · Mar 17, 2025

Username:

TITLE

Event 'cp_user_login_failed'

PRODUCT, VERSION, OPERATING SYSTEM, ARCHITECTURE

Plesk Obsidian 18.0.68 Update #1 - Ubuntu 22.04.5 LTS

PROBLEM DESCRIPTION

When Plesk is targeted by a brute-force attack, the "cp_user_login_failed" events are processed by the task manager. With 10,000+ failed login attempts, this significantly amplifies the attack's impact, causing the task-manager process to spike to 100% CPU usage and severely delaying the processing of these events.

STEPS TO REPRODUCE

A brute-force attack on Plesk with 10,000+ login attempts can cause plesk-task-manager to experience extremely high CPU usage and delays, significantly impacting server performance.

ACTUAL RESULT

A brute-force attack on Plesk with 10,000+ login attempts can cause plesk-task-manager to hit 100% CPU usage, leading to a severe backlog of tasks that could take "days" to process.

EXPECTED RESULT

No additional CPU usage, ensuring stable performance.

ANY ADDITIONAL INFORMATION

Not really a bug, but needs optimization.

YOUR EXPECTATIONS FROM PLESK SERVICE TEAM

Help with sorting out

Sebahat.hadzhi · Mar 17, 2025

Thank you for the report, @DataPacket . We already have a registered task identified with PPPM-13123 for reworking the behavior. You can monitor the change log for the announcement of the rework. In the meantime, what we can suggest is:

If not already, please consider enabling Fail2Ban (with high ban time configured)
Disable root user login for SSH by adding/editing the following line in /etc/ssh/sshd_config:

PermitRootLogin no

Click to expand...

Carl Swart · Mar 19, 2025

We have been plagued by this issue over the last week as well.

/var/lib/plesk/task-manager/db/db.sqlite3, is excessively large.

-rw------- 1 psaadm psaadm 562327552 Mar 19 11:18 db.sqlite3

/var/lib/plesk/runtime/ fills up with thousands of directories.

We have implemented the above suggestions but performance is still hampered.

We also get errors in the panel when trying to access the Task Manager.

# /usr/local/psa/bin/task-manager -l
Communication with task manager has failed: Error while reading response, stream timed out

Trying from the CLI gives the above error.

Sebahat.hadzhi · Mar 20, 2025

Hello, @Carl Swart . If you have already enabled Fail2Ban and restricted root SSH log what else I can suggest is to restrict administrative access to your IP/network only and enable ModSecurity preferably with the OWASP ruleset (which is pretty strict, so if it causes any issues, please make sure to switch to another one.

Carl Swart · Mar 20, 2025

I have been blessed with a second server on which this is happening. Both servers are still on CentOS 7 with Tuxcare enabled, if this makes any difference.

Carl Swart · Mar 20, 2025

Even deeper investigation, reveals the following:

sqlite3 /var/lib/plesk/task-manager/db/db.sqlite3

sqlite> select count(*) from tasks;
1252151

sqlite> select count(*) from tasks where status='new';
1249933

sqlite> select count(*) from tasks where description="Event 'cp_user_login_failed' for object with ID '0'" and status='new';

1249932

My conclusion is that all the waiting tasks, adding a lot of extra load to the server, are of the kind: 'cp_user_login_failed' for object with ID '0'

Can I simply delete all those 'new' tasks with the description "Event 'cp_user_login_failed' for object with ID '0'"?

OR

Can I just change the status to failed?

I suspect that keeping all those failed records will lead to all sorts of performance / timeout issues in the Panel Task Manager menu.

Carl Swart · Mar 20, 2025

I took the gamble and deleted the entries from the task manager DB upon which the performance of the server normalised.

I am not saying this is a solution for all, so do your own research, but perhaps it helps somebody.

I used the following SQL statement:

delete from tasks where description="Event 'cp_user_login_failed' for object with ID '0'" and status='new';

Sebahat.hadzhi · Mar 20, 2025

@Carl Swart , if you have the option to get in touch with Plesk support and provide them with server access, please do. If not, please make sure you have a backup (just to be on the safe side) and try the following:

Create the following file /usr/local/psa/admin/conf/task-manager.yml with content:

timeouts:
gc:
successful: 1h
failed: 1h
incomplete: 1h

Click to expand...
Restart the plesk-task-manager service by executing systemctl restart plesk-task-manager.
Remove the config file
Restart the plesk-task-manager service again

Please note that this can still result in removing useful tasks.

CruzMark · Mar 20, 2025

Carl Swart said:
I took the gamble and deleted the entries from the task manager DB upon which the performance of the server normalised.

I am not saying this is a solution for all, so do your own research, but perhaps it helps somebody.

I used the following SQL statement:

delete from tasks where description="Event 'cp_user_login_failed' for object with ID '0'" and status='new';

Thanks for this! This was exactly what I needed to get our servers back on track. I was dealing with 6 of them, all with the same issue.

Forwarded to devs Event 'cp_user_login_failed'

DataPacket

New Pleskian

Sebahat.hadzhi

Community Manager

Carl Swart

New Pleskian

Sebahat.hadzhi

Community Manager

Carl Swart

New Pleskian

Carl Swart

New Pleskian

Carl Swart

New Pleskian

Sebahat.hadzhi

Community Manager

CruzMark

Regular Pleskian

Similar threads