Skip to content

Elastic Agent launch daemon on MacOS doesn't exit gracefully within allotted timeout, gets killed. #593

@aleksmaus

Description

@aleksmaus

Elastic Agent 8.3/8.4 doesn't exit cleanly on MacOS.
In both cases it receives SIGKILL in 5 secs after SIGTERM

The consequences observed are:

  1. 8.4 (main branch) leaves the metricbeat and flilebeat running orphant.
  2. 8.3 branch doesn't leave metricbeat and flilebeat running, but it was leaving osquerybeat running, so it seems like it's still and issue with the shutdown timing.
2022-06-22 08:48:34.236902 (system) <Notice>: booting out service: caller = launchctl[13950]<-sudo[13949]<-zsh[3235]<-login[3233]<-iTermServer-3.4[3232]<-iTerm2[3231]<-launchd[1], service = co.elastic.elastic-agent, value = 0x0
2022-06-22 08:48:34.236990 (system/co.elastic.elastic-agent [13910]) <Notice>: signaled service: Terminated: 15
2022-06-22 08:48:34.236993 (system/co.elastic.elastic-agent [13910]) <Notice>: service state: SIGTERMed
2022-06-22 08:48:34.236995 (system/co.elastic.elastic-agent [13910]) <Notice>: scheduling cleanup in 5 sec after sending Terminated: 15
2022-06-22 08:48:34.237019 (system) <Notice>: Bootout by launchctl[13950] for /Library/LaunchDaemons/co.elastic.elastic-agent.plist succeeded (0: )
2022-06-22 08:48:39.240243 (system/co.elastic.elastic-agent [13910]) <Warning>: Service did not exit 5 seconds after SIGTERM. Sending SIGKILL.
2022-06-22 08:48:39.240419 (system/co.elastic.elastic-agent [13910]) <Notice>: signaled service for SIGTERM timeout: Killed: 9
2022-06-22 08:48:39.240432 (system/co.elastic.elastic-agent [13910]) <Notice>: service state: SIGKILLed
2022-06-22 08:48:39.244891 (system/co.elastic.elastic-agent [13910]) <Notice>: exited due to SIGKILL | sent by launchd[1]
2022-06-22 08:48:39.244902 (system/co.elastic.elastic-agent [13910]) <Notice>: service state: exited
2022-06-22 08:48:39.244905 (system/co.elastic.elastic-agent [13910]) <Notice>: internal event: EXITED, code = 0
20

For confirmed bugs, please report:

  • Version: 8.4, 8.3
  • Operating System: MacOS (Monterey)
  • Steps to Reproduce:
  1. Install dev build off of the main branch on MacOS (Monterey)
  2. Observe it is started normally
  3. Invoke sudo launchctl unload /Library/LaunchDaemons/co.elastic.elastic-agent.plist
  4. Observe beats processes left behind on 8.4. Observe SIGKILL in the launchd log.

The short term solution would be to increase the exit timeout for the service explicitly.
Setting

    <key>ExitTimeOut</key>
    <integer>30</integer> 

solves the problem.

The agent took about 11 secs to shutdown upon receiving SIGTERM on my machine for example:

2022-06-22 08:54:21.984167 (system) <Notice>: booting out service: caller = launchctl[14036]<-sudo[14035]<-zsh[3235]<-login[3233]<-iTermServer-3.4[3232]<-iTerm2[3231]<-launchd[1], service = co.elastic.elastic-agent, value = 0x0
2022-06-22 08:54:21.984253 (system/co.elastic.elastic-agent [14002]) <Notice>: signaled service: Terminated: 15
2022-06-22 08:54:21.984255 (system/co.elastic.elastic-agent [14002]) <Notice>: service state: SIGTERMed
2022-06-22 08:54:21.984258 (system/co.elastic.elastic-agent [14002]) <Notice>: scheduling cleanup in 30 sec after sending Terminated: 15
2022-06-22 08:54:21.984286 (system) <Notice>: Bootout by launchctl[14036] for /Library/LaunchDaemons/co.elastic.elastic-agent.plist succeeded (0: )
....
2022-06-22 08:54:32.497418 (system/co.elastic.elastic-agent [14002]) <Notice>: exited due to exit(0)
2022-06-22 08:54:32.497458 (system/co.elastic.elastic-agent [14002]) <Notice>: service state: exited
2022-06-22 08:54:32.497463 (system/co.elastic.elastic-agent [14002]) <Notice>: internal event: EXITED, code = 0

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingv8.4.0

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions