Input Tell us about your use of the Watchdog component

Kaspar@Plesk · Jul 17, 2024

Hello everyone, we like to learn more from our Watchdog component users.

Watchdog is a versatile component with many features. Still used by many on Plesk. If you use the Watchdog component on your server(s) we would be grateful if you could share (in a post below) which OS you're using and why you use Watchdog. Specifically which Watchdog features you use and love and which features you rarely use.

All of your feedback is highly appreciated

Maarten · Jul 17, 2024

Did you hear this from Alexander? Because I've been in a conversation with him on this very subject for the past few days.

Bitpalast · Jul 17, 2024

The major factor why we don't use it but have our own solution is that watchdog is way too slow in detecting critical server situations. It takes several minutes to detect outages of services.

AYamshanov · Jul 18, 2024

Maarten, yes, all these things are connected

Peter, could you please share a little bit more information, is it an in-house solution? How it works if performs better results than other solutions like `monit`?

Maarten · Jul 18, 2024

Currently, the Watchdog extension uses a fixed timeout of 5 minutes to check if services are still running.
Like for example the Nginx check:

Code:

# Nginx
check process nginx
    with pidfile /var/run/nginx.pid
    start = "<nginx_start>"
    stop = "<nginx_stop>"
    if failed host <default_external_ip> port <nginx_port> send "GET / HTTP/1.1\r\nHost: <default_external_ip>\r\n\r\n" expect "HTTP/1\.[01x] ([1-4][0-9]{2}|502) .*\r\n" with timeout <nginx_connection_timeout> seconds then restart
    if <nginx_timeout_restarts> restarts within <nginx_timeout_cycles> cycles then timeout
    every <nginx_cycles> cycles
    mode <nginx_mode>

It would help if you could define your own timeout per service in a Watchdog GUI. That way, critical services can be restarted much faster.

Maarten · Jul 18, 2024

I wrote a guide to add your own service to the Watchdog configuration. It's more of a hack because you have to apply the same change every time the watchdog configuration is overwritten, like for instance a new version is released.

Instruction - How to add a new service to the watchdog extension

In this article, I'll explain how you can add a new service to the Watchdog extension. As an example, I've used the amavis daemon. #### Note: I've tested this on an Almalinux 8.6 server #### Make sure you have a backup of the PSA database...

talk.plesk.com

It would be a good addition for a future Watchdog component if we can add our own services that should monitored.

Bitpalast · Jul 18, 2024

AYamshanov said:
Peter, could you please share a little bit more information, is it an in-house solution? How it works if performs better results than other solutions like `monit`?

It's an inhouse solution; a rather extensive PHP script run on the command line that checks everything and anything, e.g. if the mail service is running, if the Dovecot SNI files are in place, even if there are suspicious cron jobs. It also checks the load of Apache processes, fixes typical issues like remaining symbolic links to non-existent physical Apache conf files after a user removed a domain, can restart PHP-FPM version-independently if the cpu load becomes too high or the number of Apache processes too many etc. Lots of other stuff, e.g. edac-util output, cpu temperature (e.g. sometimes a fan fails, which service monitoring utilities won't detect), database status, number of database processes per user, cpu load per user etc. It also checks that users stick with configurations that are valid by their contract (some users tend to try to extend e.g. PHP-FPM RAM usage; a similar other thread recently had this as a topic in "Reports"). It also opens test websites for each PHP-FPM version to check whether they really respond, and it also opens Roundcube webmail websites and tests whether the login page is actually displayed (as there are cases when Roundcube becomes inaccessible). And: It can fix many issues by itself. It basically does everything that is required to keep a server running smoothly. It pre-emptively does straces on processes if there could be a problem coming up soon, so that once a problem exists we already have the system status "before" it occured. The scrip
- checks every 10 seconds
- checks way more than simple service uptime monitoring does, including file contents, potentially malicious cron entries etc.
- can fix common problems (auto-doc function), can block user accounts in case of suspicious activities
- sends notifications as required

Simply service monitoring just isn't enough. You also must check that all important system configuration files are correct and in place plus that the server response time is in an acceptable range plus that the hardware is good and that users don't circumvent restrictions.

Hangover2 · Aug 27, 2024

Similar to @Bitpalast, we are also using our own solution, but it is based on Ansible. This helps us to:

- Monitor the configuration and services.
- Check for known anomalies.
- Repair the configuration if needed and restart the services.
- Fix common bugs and problems that are known by Plesk but have not been patched yet.

So, in our case, just monitoring is not enough either. Plesk has a lot of good support articles that address various problems. Step by step, we are converting them into Ansible playbooks. Checking from the 'outside world' also has the advantage that server crashes or service downtimes can be detected immediately, even when Watchdog is already 'dead'. And as Debian 11/12 users, we are not supported too.

Input Tell us about your use of the Watchdog component

Kaspar@Plesk

Community Manager up till 07/2024

Maarten

Golden Pleskian

Bitpalast

Plesk addicted!

AYamshanov

Golden Pleskian

Maarten

Golden Pleskian

Maarten

Golden Pleskian

Instruction - How to add a new service to the watchdog extension

Bitpalast

Plesk addicted!

Hangover2

Regular Pleskian

Similar threads