-
Notifications
You must be signed in to change notification settings - Fork 31
Description
I've previously commented on this livestatus issue but probably should have opened a new one here instead. Sorry.
Basically, the problem I see is that even in a fresh install without any custom configuration except for the TCP livestatus socket, after a systemctl reload naemon, there are two processes listening:
vagrant@bookworm:~$ sudo netstat -tupan | grep -e Recv -e naemon
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:6557 0.0.0.0:* LISTEN 4067/naemon
tcp 3 0 0.0.0.0:6557 0.0.0.0:* LISTEN 4072/naemon
One of them is not responding (waiting to be reaped? although top is not saying it's a zombie)
As a result, Thruk sometimes behaves erratically, says the backend is down etc.
This is the config:
vagrant@bookworm:~$ cat /etc/naemon/module-conf.d/livestatus.cfg
# Naemon config
broker_module=/usr/lib/naemon/naemon-livestatus/livestatus.so inet_addr=0.0.0.0:6557 debug=1
event_broker_options=-1
This can be easily reproduced in vagrant.
$ vagrant init debian/bookworm64
$ vagrant up
$ vagrant ssh
Next, copy these commands into a script and execute it.
vagrant@bookworm:~$ wget -O reproduce https://github.com/naemon/naemon-livestatus/files/13950328/reproduce.txt
vagrant@bookworm:~$ chmod +x reproduce
vagrant@bookworm:~$ ./reproduce
This should result in something like
<installation>
----------
Restarting
tcp 0 0 0.0.0.0:6557 0.0.0.0:* LISTEN 5408/naemon
naemon,5408 --daemon /etc/naemon/naemon.cfg
├─naemon,5409 --worker /var/lib/naemon/naemon.qh
├─naemon,5410 --worker /var/lib/naemon/naemon.qh
├─naemon,5411 --worker /var/lib/naemon/naemon.qh
├─naemon,5412 --worker /var/lib/naemon/naemon.qh
└─naemon,5413 --daemon /etc/naemon/naemon.cfg
systemd,5367 --user
└─(sd-pam),5368
----------------------
Reloading until broken
Success.
tcp 0 0 0.0.0.0:6557 0.0.0.0:* LISTEN 5408/naemon
tcp 0 0 0.0.0.0:6557 0.0.0.0:* LISTEN 5413/naemon
naemon,5408 --daemon /etc/naemon/naemon.cfg
├─naemon,5413 --daemon /etc/naemon/naemon.cfg
├─naemon,5434 --worker /var/lib/naemon/naemon.qh
├─naemon,5435 --worker /var/lib/naemon/naemon.qh
├─naemon,5436 --worker /var/lib/naemon/naemon.qh
└─naemon,5437 --worker /var/lib/naemon/naemon.qh
systemd,5367 --user
└─(sd-pam),5368
---------------------------------
Running "GET status" every second, response size 0 is not good:
2024-01-16T13:08:05+00:00 976
2024-01-16T13:08:06+00:00 976
2024-01-16T13:08:07+00:00 0
2024-01-16T13:08:10+00:00 977
2024-01-16T13:08:11+00:00 0
^C
Notice that process 5413 already exists when naemon is first started, but only after the reload, it also starts listening on that socket.
My current workaround is to restart instead of reload after each config change, but this takes a lot longer than reloading (rather large config). Or I should go back to xinetd.