I'm using Zabbix to monitor a lot on my servers.
One of those many things is that I let Zabbix issue an Nginx -T each half hour.
Nginx will only adopt a new configuration after its configuration is tested.
This way Nginx will not suddenly stop working after some configuration error.
By letting Zabbix test the configuration each half hour I can be made aware of some configuration error before it becomes a problem. I now managed to solve the arisen problem with no downtime at all as nginx never ran with that config.
This morning I noticed an alert in the dashboard of Zabbix concerning the configuration of nginx on only 1 of my several servers. I then logged into that server to find out what it was
# nginx -T
nginx: [emerg] a duplicate listen 127.0.0.1:61709 in /etc/nginx/conf.d/ww010_zabbix.conf:2
nginx: configuration file /etc/nginx/nginx.conf test failed
This configuration file has been there for years and was not altered. It's a telemetry feature of Nginx which is used by Nginx. It has/had this content:
# cat /etc/nginx/conf.d/ww010_zabbix.conf
server {
listen localhost:61709;
location / {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
}
I then checked if nginx has been altered:
# ls -l /usr/sbin/nginx
-rwxr-xr-x 1 root root 959K Aug 21 14:43 /usr/sbin/nginx
Aha...
It seems that I was given a new nginx last night that was compiled a week ago.
I then suspected that nginx already had that feature of providing telemetry on localhost by default, so I disabled to config by adding the suffix ".disabled" to the config file, restart nginx and issue a netstat -lntp | grep nginx
But no, nothing of the kind
# netstat -lntp | grep nginx
tcp 0 0 76.204.211.29:80 0.0.0.0:* LISTEN 11079/nginx
tcp 0 0 76.204.211.29:443 0.0.0.0:* LISTEN 11079/nginx
I then renamed the config file back to one ending with .conf and changed the port. But no, that made no difference.
Only after I renamed "localhost" to "127.0.0.1" I regained the original situation where nginx provided telemetry info again.
# netstat -lntp | grep nginx
tcp 0 0 127.0.0.1:61709 0.0.0.0:* LISTEN 12066/nginx
tcp 0 0 76.204.211.29:80 0.0.0.0:* LISTEN 12066/nginx
tcp 0 0 76.204.211.29:443 0.0.0.0:* LISTEN 12066/nginx
Without my monitoring "nginx -t" constantly I could never have been so sure to point the culprit. No damage was done because of nginx's feature to only accept "proper" configurations. Only because I monitor it I can know that this happened last night. Normally this would expose itself much later after I would have changed something else and stopped/started nginx manually or rebooted the server (most of my servers have an uptime of several years unless they are newer).
So why does it suddenly only wants 127.0.0.1 and gives this awkward "duplicate error" when I use localhost?
One of those many things is that I let Zabbix issue an Nginx -T each half hour.
Nginx will only adopt a new configuration after its configuration is tested.
This way Nginx will not suddenly stop working after some configuration error.
By letting Zabbix test the configuration each half hour I can be made aware of some configuration error before it becomes a problem. I now managed to solve the arisen problem with no downtime at all as nginx never ran with that config.
This morning I noticed an alert in the dashboard of Zabbix concerning the configuration of nginx on only 1 of my several servers. I then logged into that server to find out what it was
# nginx -T
nginx: [emerg] a duplicate listen 127.0.0.1:61709 in /etc/nginx/conf.d/ww010_zabbix.conf:2
nginx: configuration file /etc/nginx/nginx.conf test failed
This configuration file has been there for years and was not altered. It's a telemetry feature of Nginx which is used by Nginx. It has/had this content:
# cat /etc/nginx/conf.d/ww010_zabbix.conf
server {
listen localhost:61709;
location / {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
}
I then checked if nginx has been altered:
# ls -l /usr/sbin/nginx
-rwxr-xr-x 1 root root 959K Aug 21 14:43 /usr/sbin/nginx
Aha...
It seems that I was given a new nginx last night that was compiled a week ago.
I then suspected that nginx already had that feature of providing telemetry on localhost by default, so I disabled to config by adding the suffix ".disabled" to the config file, restart nginx and issue a netstat -lntp | grep nginx
But no, nothing of the kind
# netstat -lntp | grep nginx
tcp 0 0 76.204.211.29:80 0.0.0.0:* LISTEN 11079/nginx
tcp 0 0 76.204.211.29:443 0.0.0.0:* LISTEN 11079/nginx
I then renamed the config file back to one ending with .conf and changed the port. But no, that made no difference.
Only after I renamed "localhost" to "127.0.0.1" I regained the original situation where nginx provided telemetry info again.
# netstat -lntp | grep nginx
tcp 0 0 127.0.0.1:61709 0.0.0.0:* LISTEN 12066/nginx
tcp 0 0 76.204.211.29:80 0.0.0.0:* LISTEN 12066/nginx
tcp 0 0 76.204.211.29:443 0.0.0.0:* LISTEN 12066/nginx
Without my monitoring "nginx -t" constantly I could never have been so sure to point the culprit. No damage was done because of nginx's feature to only accept "proper" configurations. Only because I monitor it I can know that this happened last night. Normally this would expose itself much later after I would have changed something else and stopped/started nginx manually or rebooted the server (most of my servers have an uptime of several years unless they are newer).
So why does it suddenly only wants 127.0.0.1 and gives this awkward "duplicate error" when I use localhost?