• We value your experience with Plesk during 2024
    Plesk strives to perform even better in 2025. To help us improve further, please answer a few questions about your experience with Plesk Obsidian 2024.
    Please take this short survey:

    https://pt-research.typeform.com/to/AmZvSXkx
  • The Horde webmail has been deprecated. Its complete removal is scheduled for April 2025. For details and recommended actions, see the Feature and Deprecation Plan.
  • We’re working on enhancing the Monitoring feature in Plesk, and we could really use your expertise! If you’re open to sharing your experiences with server and website monitoring or providing feedback, we’d love to have a one-hour online meeting with you.

Input Tell us about your use of the Watchdog component

Kaspar@Plesk

Community Manager up till 07/2024
Staff member
Hello everyone, we like to learn more from our Watchdog component users.

Watchdog is a versatile component with many features. Still used by many on Plesk. If you use the Watchdog component on your server(s) we would be grateful if you could share (in a post below) which OS you're using and why you use Watchdog. Specifically which Watchdog features you use and love and which features you rarely use.

All of your feedback is highly appreciated :)
 
The major factor why we don't use it but have our own solution is that watchdog is way too slow in detecting critical server situations. It takes several minutes to detect outages of services.
 
Maarten, yes, all these things are connected :)

Peter, could you please share a little bit more information, is it an in-house solution? How it works if performs better results than other solutions like `monit`?
 
Currently, the Watchdog extension uses a fixed timeout of 5 minutes to check if services are still running.
Like for example the Nginx check:
Code:
# Nginx
check process nginx
    with pidfile /var/run/nginx.pid
    start = "<nginx_start>"
    stop = "<nginx_stop>"
    if failed host <default_external_ip> port <nginx_port> send "GET / HTTP/1.1\r\nHost: <default_external_ip>\r\n\r\n" expect "HTTP/1\.[01x] ([1-4][0-9]{2}|502) .*\r\n" with timeout <nginx_connection_timeout> seconds then restart
    if <nginx_timeout_restarts> restarts within <nginx_timeout_cycles> cycles then timeout
    every <nginx_cycles> cycles
    mode <nginx_mode>

It would help if you could define your own timeout per service in a Watchdog GUI. That way, critical services can be restarted much faster.
 
I wrote a guide to add your own service to the Watchdog configuration. It's more of a hack because you have to apply the same change every time the watchdog configuration is overwritten, like for instance a new version is released.


It would be a good addition for a future Watchdog component if we can add our own services that should monitored.
 
Peter, could you please share a little bit more information, is it an in-house solution? How it works if performs better results than other solutions like `monit`?
It's an inhouse solution; a rather extensive PHP script run on the command line that checks everything and anything, e.g. if the mail service is running, if the Dovecot SNI files are in place, even if there are suspicious cron jobs. It also checks the load of Apache processes, fixes typical issues like remaining symbolic links to non-existent physical Apache conf files after a user removed a domain, can restart PHP-FPM version-independently if the cpu load becomes too high or the number of Apache processes too many etc. Lots of other stuff, e.g. edac-util output, cpu temperature (e.g. sometimes a fan fails, which service monitoring utilities won't detect), database status, number of database processes per user, cpu load per user etc. It also checks that users stick with configurations that are valid by their contract (some users tend to try to extend e.g. PHP-FPM RAM usage; a similar other thread recently had this as a topic in "Reports"). It also opens test websites for each PHP-FPM version to check whether they really respond, and it also opens Roundcube webmail websites and tests whether the login page is actually displayed (as there are cases when Roundcube becomes inaccessible). And: It can fix many issues by itself. It basically does everything that is required to keep a server running smoothly. It pre-emptively does straces on processes if there could be a problem coming up soon, so that once a problem exists we already have the system status "before" it occured. The scrip
- checks every 10 seconds
- checks way more than simple service uptime monitoring does, including file contents, potentially malicious cron entries etc.
- can fix common problems (auto-doc function), can block user accounts in case of suspicious activities
- sends notifications as required

Simply service monitoring just isn't enough. You also must check that all important system configuration files are correct and in place plus that the server response time is in an acceptable range plus that the hardware is good and that users don't circumvent restrictions.
 
Similar to @Bitpalast, we are also using our own solution, but it is based on Ansible. This helps us to:

- Monitor the configuration and services.
- Check for known anomalies.
- Repair the configuration if needed and restart the services.
- Fix common bugs and problems that are known by Plesk but have not been patched yet.

So, in our case, just monitoring is not enough either. Plesk has a lot of good support articles that address various problems. Step by step, we are converting them into Ansible playbooks. Checking from the 'outside world' also has the advantage that server crashes or service downtimes can be detected immediately, even when Watchdog is already 'dead'. And as Debian 11/12 users, we are not supported too.
 
Back
Top