Skip to content

Conversation

@douglascamata
Copy link
Member

Description

This PR adds a healthcheck endpoint to the Supervisor. It runs in its own dedicated HTTP server. For now the servers uses port 23233 (inspired on the Collector's default healthcheck port: 13133). The Supervisor's healthcheck port is not configurable at the moment, but it could be in the future.

Link to tracking issue

Fixes #40529.

Testing

Unit test added.

@douglascamata douglascamata marked this pull request as ready for review July 28, 2025 12:23
@douglascamata douglascamata requested review from a team, atoulme and evan-bradley as code owners July 28, 2025 12:23
@github-actions github-actions bot requested a review from tigrannajaryan July 28, 2025 12:23
By default, it's zero and in this case we don't start a healthcheck
server.
Copy link
Contributor

@atoulme atoulme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can land as is imo - I have a nit about the network interface but I'm ok if we address it separately based on more needs being shored up

@atoulme atoulme self-requested a review August 5, 2025 23:46
@douglascamata
Copy link
Member Author

douglascamata commented Aug 6, 2025

@atoulme @evan-bradley I finished taken care of the pending comments and did a small update to the respective changelog entry. Please have another look when you have some time. Thank you for your review. 🙇

@douglascamata
Copy link
Member Author

There are some flaky test failures on sqlserverreceiver. Would love if someone could rerun this: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/16780407835/job/47517243830?pr=41565

Copy link
Contributor

@atoulme atoulme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - @evan-bradley please take one more look and feel free to merge, thanks!

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just a few suggestions. Two additional requests if you don't mind:

  1. Can we add this config option to one of the YAML files in testdata/supervisor and verify the config is parsed correctly? This would probably be an E2E test.
  2. Can you document this option in the readme?

@douglascamata
Copy link
Member Author

@evan-bradley all the feedback taken care of, sir!

@douglascamata
Copy link
Member Author

Oh wait, got some weird test failures.

@douglascamata
Copy link
Member Author

@evan-bradley now the tests should all pass.

@evan-bradley
Copy link
Contributor

It looks like there's a leaking goroutine in one of the E2E tests, I assume the new one unless something changed on main. I took a quick look and nothing jumped out at me.

The interactions of the callbacks with the `connectedChan` had the risk
to introduce a go-routine leak in the tests if
`waitForSupervisorConnection` was not called before the Supervisor was
shutdown.

Closing the `connectedChan` in the OpAMP server shutdown function
allowed me to detect this.
Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being so responsive and diligent throughout the review process, this one turned out to have a few more considerations than I was expecting at the onset.

@evan-bradley evan-bradley merged commit 50d94b9 into open-telemetry:main Aug 7, 2025
185 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[cmd/opampsupervisor] Add a healthcheck endpoint

4 participants