-
Notifications
You must be signed in to change notification settings - Fork 353
Open
Labels
Description
Sometimes a new collector gets deployed and it doesn't work, or more commonly it only works on a small subset of hosts and it doesn't properly exit(13) on the hosts where it's not supposed to run. What would be nice is to have a dead-simple karma point system:
- When the collector is first discovered and first started, it gets X karma points.
- Each time the collector crashes, it loses C karma points.
- Every N seconds that elapse, the collector gains G karma points, up to an upper bound of Gmax points.
- Whenever a collector crashes, we check its karma, if it's negative, we mark it as dead and don't restart it anymore
The idea is that if a collector crashes too often, we want to give up on it, instead of spamming the logs. But if a collector has been up for a while, and all of a sudden it starts crashing a few times in a row, it's worth trying some more before giving up on it.