Skip to content

[cmd/opampsupervisor] Emit spans for startup and message handling #38797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

bacherfl
Copy link
Contributor

Description

Currently WIP, need buy-in from the codeowners for the related issue, but this should serve as a PoC of the suggested enhancement

This PR adds the emission of spans for the following events:

  • Start of the supervisor
  • Handling Message from the OpAMP server
  • Handling messages from the collector

Link to tracking issue

Fixes #38724

Testing

Added assertions to the existing e2e tests to verify the expected spans

Signed-off-by: Florian Bacher <[email protected]>
# Conflicts:
#	cmd/opampsupervisor/supervisor/supervisor.go
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
@bacherfl bacherfl marked this pull request as ready for review March 26, 2025 11:27
@bacherfl bacherfl requested a review from a team as a code owner March 26, 2025 11:27
@@ -534,7 +545,7 @@ func (s *Supervisor) getBootstrapInfo() (err error) {
case err = <-done:
if errors.Is(err, errNonMatchingInstanceUID) {
// try to report the issue to the OpAMP server
if startOpAMPErr := s.startOpAMPClient(); startOpAMPErr == nil {
if startOpAMPErr := s.startOpAMPClient(context.Background()); startOpAMPErr == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you not pass the ctx of the outside span from line 403?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thank you - seems like i missed that one - I'll try it out to be sure, but I think the context from further above should be the correct one to use here

@@ -1200,14 +1249,18 @@ func (s *Supervisor) composeMergedConfig(config *protobufs.AgentRemoteConfig) (c
return configChanged, nil
}

func (s *Supervisor) handleRestartCommand() error {
func (s *Supervisor) handleRestartCommand(ctx context.Context) error {
_, span := s.getTracer().Start(ctx, "HandleRestartCommand")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_, span := s.getTracer().Start(ctx, "HandleRestartCommand")
_, span := s.getTracer().Start(ctx, "handleRestartCommand")

not sure if you meant to use the exact function name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here i was not sure if i should consistently start with upper case for the span names, or use the exact function name - however the latter option might make more sense, so I will double check and make sure to use the exact function names

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at this, @bacherfl, and I'm sorry for the really long delay with a review.

I like the idea of having the Supervisor be a good example of an observable application, but I think we need to be conscious of what questions we're hoping to help operators answer when monitoring or debugging their Supervisor instances. If possible, I'd like to outline an overall approach for Supervisor observability through traces before we continue with this PR. I'll leave some starting thoughts in the issue.

if err = s.getBootstrapInfo(); err != nil {
return fmt.Errorf("could not get bootstrap info from the Collector: %w", err)
if err = s.getBootstrapInfo(ctx); err != nil {
span.SetStatus(codes.Error, fmt.Sprintf("Could not get bootstrap info from the Collector: %s", err.Error()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still return an error here.

@@ -309,24 +310,27 @@ func initTelemetrySettings(logger *zap.Logger, cfg config.Telemetry) (telemetryS
}

func (s *Supervisor) Start() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to play devil's advocate, do we always want a span for Supervisor startup? Most of these issues should be uncommon, and the call stack is fairly shallow (main -> (*Supervisor).Start), so I think we either end up with an empty span, or one that more or less directly mirrors the log the Supervisor will print.

I don't have enough expertise with the Supervisor in production environments to really make a call as to whether this will be useful, so if anyone else has an opinion that would be welcome, but I want to make sure the telemetry we're emitting is useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[cmd/opampsupervisor] Emit spans via trace exporter
4 participants