Watchdog: The Web Server (Apache) service on host is down

AbramS · Feb 6, 2015

Early January I discussed a bug that related to the creation of new SSL certificates in Plesk 12: http://talk.plesk.com/threads/new-ssl-certificate-produces-nginx-apache-configuration-errors.331041/

Since the same timeframe (January 9th) I've been getting daily messages from Plesk Watchdog stating that Apache has gone down and has come up again.

One of the latest examples:

Code:

The Web Server (Apache) service on host ... is down.
The problem was discovered on Feb  6, 2015 02:10 AM.

The Web Server (Apache) service on host ... has been started on Feb  6, 2015 02:15 AM.

I've been trying to get to the source of this problem for some time, but I've been unable to pinpoint the problem.

I started digging into the global error log in /var/log/httpd/

There are three types of notices I find in this file:

1. Once in a while a file not found or security notice, as can be expected.

2. A never ending list of the following three PHP Warnings:

Code:

PHP Warning:  Module 'apc' already loaded in Unknown on line 0
PHP Warning:  Module 'memcache' already loaded in Unknown on line 0
PHP Warning:  Module 'memcached' already loaded in Unknown on line 0
PHP Warning:  Module 'apc' already loaded in Unknown on line 0
PHP Warning:  Module 'memcache' already loaded in Unknown on line 0
PHP Warning:  Module 'memcached' already loaded in Unknown on line 0

The only reason I could think of why these would show is the fact that I'm running PHP as a FastCGI application and load those three extensions through the additional directives on each domain. The thing is though... If I remove the extensions from the additional directives, the extensions no longer load. I've tested this by looking at a PHP Info file and removing the lines for the additional directives. So it doesn't seem that the FCGI session is loading the extension twice... Does anyone have a clue why this would happen? How would one fix this?

3. The shutdown / startup notice for the Apache server, that seems to give little insight into the origin of the problem:

Code:

[Fri Feb 06 02:10:31 2015] [notice] caught SIGTERM, shutting down
[Fri Feb 06 02:10:35 2015] [notice] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
[Fri Feb 06 02:10:35 2015] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `...' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `...' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `...' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `...' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `...' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:36 2015] [warn] Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Fri Feb 06 02:10:36 2015] [notice] ModSecurity for Apache/2.8.0 (http://www.modsecurity.org/) configured.
[Fri Feb 06 02:10:36 2015] [notice] ModSecurity: APR compiled version="1.3.9"; loaded version="1.3.9"
[Fri Feb 06 02:10:36 2015] [notice] ModSecurity: PCRE compiled version="7.8 "; loaded version="7.8 2008-09-05"
[Fri Feb 06 02:10:36 2015] [notice] ModSecurity: LIBXML compiled version="2.7.6"
[Fri Feb 06 02:10:36 2015] [notice] Original server signature: Apache
[Fri Feb 06 02:10:36 2015] [notice] Status engine is currently disabled, enable it by set SecStatusEngine to On.
[Fri Feb 06 02:10:36 2015] [notice] Digest: generating secret for digest authentication ...
[Fri Feb 06 02:10:36 2015] [notice] Digest: done
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `...' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `...' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `...' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `...' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `...' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Fri Feb 06 02:10:37 2015] [warn] Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Fri Feb 06 02:10:37 2015] [notice] mod_python: Creating 4 session mutexes based on 256 max processes and 0 max threads.
[Fri Feb 06 02:10:37 2015] [notice] mod_python: using mutex_directory /tmp
[Fri Feb 06 02:10:37 2015] [notice] Apache/2.2.15 (Unix) DAV/2 mod_ssl/2.2.15 OpenSSL/1.0.1e-fips Apache mod_fcgid/2.3.9 mod_python/3.3.1 Python/2.6.6 mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations

Can anyone extract an answer to the question why Apache is stopping and starting from this log? If not: where would I continue my search? As there are no further warnings/notices in the error.log..

Thanks in advance!

Jürgen_Traum · Feb 9, 2015

Same Problem here.. since january I get every second day the same mail.

I have a standard installation, no changes.
What runs at this time, I have also the same times

The Web Server (Apache) service on host xxx is down.
The problem was discovered on Feb 10, 2015 02:11 AM.
The Web Server (Apache) service on host xxx has been started on Feb 10, 2015 02:16 AM.

my Apache is checked every 5 minutes via extern program.
This shows no downtime

ankn99 · Feb 10, 2015

Hi,

take a look at what time your backup's are running ?!?
I'm getting the same Errors, but only at the time my system backup and regular backups are running !
If I'm not wrong, the watchdog doesn't get his test Anwers fast enough from the Webserver because auf the high systemload !
By the way, since the backup runs with the packer pigz (Parallels changed it to get the backups faster done, GZIP took only one core to work with and PIGZ is taking all Cores) it's eating almost every resource the System has.
I wish to have an option anywhere , to turn the prio down so it doesn't eat so much resources.

Anyway, because of the systemload, wtchdog think's the http-daemons are gone and restarts them !

AbramS · Feb 10, 2015

Jürgen_Traum said:
my Apache is checked every 5 minutes via extern program.
This shows no downtime

Hi Jürgen, I've had similar results. I'm using the free edition of UpTime Robot which reports every 5 minutes and NewRelic that grabs and analyses logs and statuses through their own software. Neither of these systems have reported downtime at any of those instances. When I started noticing an increase in these messages, I changed the uptime monitor that my isp provides to check every minute. Since than it has reported httpd downtime once. In my case, the logs do explain why I'm not getting anything from uptime robot, NewRelic or even my isp: From SIGTERM to Resuming normal operations takes only 6 seconds... I'm even surprised that my isp's monitor managed to capture one of these instances...

ankn99 said:
take a look at what time your backup's are running ?!?
I'm getting the same Errors, but only at the time my system backup and regular backups are running !
If I'm not wrong, the watchdog doesn't get his test Anwers fast enough from the Webserver because auf the high systemload !
Anyway, because of the systemload, wtchdog think's the http-daemons are gone and restarts them !

Ankn99,

You might be on to a part of the problem here... I did a couple of tests and came to the following conclusions:

1.On the one hand: It doesn't seem to be the core of the problem in my case. I compared 26 watchdog warnings for Apache. In only two cases the timing of the warning seemed to overlap: 3:34AM (1) & 3:35AM (2) and 11:03 (1) & 11:03 (2). Of those two cases only one (11:03) was related to a scheduled backup, as I only run a very small amount of scheduled backups.

2. On the other hand: We can be very certain that there's something very wrong with the relationship between the backup manager tasks and Watchdog: I ran a couple of different manual backups to see if this would get Watchdog to start generating Apache downtime notifications. In all occasions (backups <100MB up to 1GB) I was immediately confronted with a WatchDog mail: The Web Server (Apache) service on host ... is down.

3. As far as resource usage goes... According to NewRelic during the larger backup the server had an average CPU usage of 50.9% (+9.29% system +7.47% wait) and a load average of 2.31. During the smaller backup this was 24.4% (+8.07% system +3.47% wait) / 0.767 respectively. So the CPU usage/load does increase drastically, but does not get dangerously high on my setup. There seems to be more than enough headroom left.

ankn99 said:
By the way, since the backup runs with the packer pigz (Parallels changed it to get the backups faster done, GZIP took only one core to work with and PIGZ is taking all Cores) it's eating almost every resource the System has.
I wish to have an option anywhere , to turn the prio down so it doesn't eat so much resources.

In Tools & Settings > Backup Settings you can find the following two options:
1. Run scheduled backup processes with low priority
2. Run all backup processes with low priority

I have both enabled.

Plesk team: I'm really hoping for an insightful response based on these results and my initial questions. Hope to hear from you guys soon!

ankn99 · Feb 10, 2015

@AbramS: thank's for your tip's, I have both of them turned on too ....

These Watchdog Messages Show up also, when your Server is under attack per huge and/or a lot http requests .....
And if you working with big SQL-Tables ( 1GB and more, in my case 3 GB ).

About the Attacks .... I noticed a better system response since I switched to NGINX as a Webserver with PHP handling ......

AbramS · Feb 10, 2015

ankn99 said:
These Watchdog Messages Show up also, when your Server is under attack per huge and/or a lot http requests .....
And if you working with big SQL-Tables ( 1GB and more, in my case 3 GB ).

About the Attacks .... I noticed a better system response since I switched to NGINX as a Webserver with PHP handling ......

I agree, but I'm 100% sure that is not the case in my situation. The last time I had to deal with an enormous amount of http requests, I could see this happening (cpu, memory, processes, network activity) in NewRelic immediately. I also get warned by NewRelic when something like that would be going on.

The problems I'm describing in this thread are unrelated to resource availability, as I've been actively monitoring it. Further more: I have applied a firm caching policy to all of the websites that run on my server. Because of this, most of the content is already being served directly by nginx and I can keep database queries to a minimum.

AbramS · Feb 12, 2015

Plesk team (Igor): any useful insights in regards to this? Would be extremely appreciated.

AbramS · Mar 20, 2015

So it's been a while and I recently had to update another server from 11.5 to 12. This server is now displaying the same behaviour: nginx and httpd tend to restart for no apparent reason. Additionally: both servers have an issue where httpd and nginx seem to start in a incorrect order. So after a server restart one has to manually run a service httpd stop, service nginx stop, service httpd start, service nginx start to get around the default apache landing page.

AbramS · May 13, 2015

I'm giving this another bump, as I'm still having these issues without seeing anything in the logs or having resource issues. Hope someone from the team can get back to me on this.

RuslanT · May 13, 2015

Try to check /var/log/plesk/modules/monit.log - it will indicate if apache is restarted by watchdog.

AbramS · May 13, 2015

Hi RuslanT,

Thank you for that hint. I had a look and Watchdog is definitely reporting on these nginx and apache restarts in this log. What to do? Disable watchdog? Or do I need to look somewhere else? I've attached todays chunk of the log. Thanks in advance for your help!

P.S. for now I've increased the connection timeout value for Apache and nginx in Watchdog from 5 to 15 seconds.

Code:

[CEST May 13 03:31:07] GENERIC: error receiving data -- Resource temporarily unavailable
[CEST May 13 03:31:07] 'apache' failed protocol test [generic] at INET[127.0.0.1:7080].
[CEST May 13 03:31:07] 'apache' trying to restart
[CEST May 13 03:31:07] 'apache' stop: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 03:31:10] 'apache' start: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 03:31:15] GENERIC: error receiving data -- Resource temporarily unavailable
[CEST May 13 03:31:15] 'nginx' failed protocol test [generic] at INET[149.210.134.111:80].
[CEST May 13 03:31:15] 'nginx' trying to restart
[CEST May 13 03:31:15] 'nginx' stop: /usr/local/psa/admin/bin/nginx_control
[CEST May 13 03:31:16] 'nginx' start: /usr/local/psa/admin/bin/nginx_control
[CEST May 13 03:36:23] 'apache' connection passed
[CEST May 13 03:36:23] 'nginx' connection passed
[CEST May 13 03:41:28] 'psa_spamassassin' process PID changed to 7691
[CEST May 13 03:46:34] 'psa_spamassassin' PID has not changed
[CEST May 13 14:38:15] GENERIC: error receiving data -- Resource temporarily unavailable
[CEST May 13 14:38:15] 'apache' failed protocol test [generic] at INET[127.0.0.1:7080].
[CEST May 13 14:38:15] 'apache' trying to restart
[CEST May 13 14:38:15] 'apache' stop: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 14:38:17] 'apache' start: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 14:38:22] GENERIC: error receiving data -- Resource temporarily unavailable
[CEST May 13 14:38:22] 'nginx' failed protocol test [generic] at INET[149.210.134.111:80].
[CEST May 13 14:38:22] 'nginx' trying to restart
[CEST May 13 14:38:22] 'nginx' stop: /usr/local/psa/admin/bin/nginx_control
[CEST May 13 14:38:23] 'nginx' start: /usr/local/psa/admin/bin/nginx_control
[CEST May 13 14:43:29] 'apache' connection passed
[CEST May 13 14:43:29] 'nginx' connection passed
[CEST May 13 14:48:37] GENERIC: error receiving data -- Resource temporarily unavailable
[CEST May 13 14:48:37] 'apache' failed protocol test [generic] at INET[127.0.0.1:7080].
[CEST May 13 14:48:37] 'apache' trying to restart
[CEST May 13 14:48:37] 'apache' stop: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 14:48:38] 'apache' start: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 14:53:51] 'apache' connection passed
[CEST May 13 15:50:03] GENERIC: error receiving data -- Resource temporarily unavailable
[CEST May 13 15:50:03] 'apache' failed protocol test [generic] at INET[127.0.0.1:7080].
[CEST May 13 15:50:03] 'apache' trying to restart
[CEST May 13 15:50:03] 'apache' stop: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 15:50:05] 'apache' start: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 15:55:15] 'apache' connection passed
[CEST May 13 16:36:09] GENERIC: error receiving data -- Resource temporarily unavailable
[CEST May 13 16:36:09] 'apache' failed protocol test [generic] at INET[127.0.0.1:7080].
[CEST May 13 16:36:09] 'apache' trying to restart
[CEST May 13 16:36:09] 'apache' stop: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 16:36:11] 'apache' start: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 16:41:18] 'apache' connection passed
[CEST May 13 21:16:23] GENERIC: error receiving data -- Resource temporarily unavailable
[CEST May 13 21:16:23] 'apache' failed protocol test [generic] at INET[127.0.0.1:7080].
[CEST May 13 21:16:23] 'apache' trying to restart
[CEST May 13 21:16:23] 'apache' stop: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 21:16:24] 'apache' start: /usr/local/psa/admin/bin/websrvmng
[CEST May 13 21:16:29] GENERIC: error receiving data -- Resource temporarily unavailable
[CEST May 13 21:16:29] 'nginx' failed protocol test [generic] at INET[149.210.134.111:80].
[CEST May 13 21:16:29] 'nginx' trying to restart
[CEST May 13 21:16:29] 'nginx' stop: /usr/local/psa/admin/bin/nginx_control
[CEST May 13 21:16:30] 'nginx' start: /usr/local/psa/admin/bin/nginx_control
[CEST May 13 21:21:36] 'apache' connection passed
[CEST May 13 21:21:36] 'nginx' connection passed

ScuL81 · Jul 4, 2015

I had this problem too, all of a sudden after a server reboot.

I found this:
http://kb.odin.com/en/124443

I went in and changed the time out duration to 10 seconds, fingers crossed

greos · Sep 6, 2015

I'm still having this issue everytime enabling watchdog.
After reboot I get Apache test page instead of website page.
Change on Connection Timeout (to 15sec) at watchdog didn't help.

Any solution?

UFHH01 · Sep 6, 2015

UFHH01 said:
Hi greos,

Please don't mix watchdog with any webserver - service or even any other service on your server. Watchdog is only a tool, which monitors services and in case of a non-working state of the service, it tries to restart the service.

If you would like help with your apache - server, please consider to use the "right" thread, or open your own thread with your issue, if you think, that the issue is not discussed else where.

Watchdog: The Web Server (Apache) service on host is down

AbramS

Basic Pleskian

Jürgen_Traum

New Pleskian

ankn99

New Pleskian

AbramS

Basic Pleskian

ankn99

New Pleskian

AbramS

Basic Pleskian

AbramS

Basic Pleskian

AbramS

Basic Pleskian

AbramS

Basic Pleskian

RuslanT

Regular Pleskian

AbramS

Basic Pleskian

ScuL81

New Pleskian

greos

New Pleskian

UFHH01

Guest

Similar threads