-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[cmd/opampsupervisor] Emit spans for startup and message handling #38797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
# Conflicts: # cmd/opampsupervisor/supervisor/supervisor.go
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
@@ -534,7 +545,7 @@ func (s *Supervisor) getBootstrapInfo() (err error) { | |||
case err = <-done: | |||
if errors.Is(err, errNonMatchingInstanceUID) { | |||
// try to report the issue to the OpAMP server | |||
if startOpAMPErr := s.startOpAMPClient(); startOpAMPErr == nil { | |||
if startOpAMPErr := s.startOpAMPClient(context.Background()); startOpAMPErr == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you not pass the ctx of the outside span from line 403?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thank you - seems like i missed that one - I'll try it out to be sure, but I think the context from further above should be the correct one to use here
@@ -1200,14 +1249,18 @@ func (s *Supervisor) composeMergedConfig(config *protobufs.AgentRemoteConfig) (c | |||
return configChanged, nil | |||
} | |||
|
|||
func (s *Supervisor) handleRestartCommand() error { | |||
func (s *Supervisor) handleRestartCommand(ctx context.Context) error { | |||
_, span := s.getTracer().Start(ctx, "HandleRestartCommand") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_, span := s.getTracer().Start(ctx, "HandleRestartCommand") | |
_, span := s.getTracer().Start(ctx, "handleRestartCommand") |
not sure if you meant to use the exact function name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here i was not sure if i should consistently start with upper case for the span names, or use the exact function name - however the latter option might make more sense, so I will double check and make sure to use the exact function names
Signed-off-by: Florian Bacher <[email protected]>
# Conflicts: # cmd/opampsupervisor/supervisor/supervisor.go
Signed-off-by: Florian Bacher <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look at this, @bacherfl, and I'm sorry for the really long delay with a review.
I like the idea of having the Supervisor be a good example of an observable application, but I think we need to be conscious of what questions we're hoping to help operators answer when monitoring or debugging their Supervisor instances. If possible, I'd like to outline an overall approach for Supervisor observability through traces before we continue with this PR. I'll leave some starting thoughts in the issue.
if err = s.getBootstrapInfo(); err != nil { | ||
return fmt.Errorf("could not get bootstrap info from the Collector: %w", err) | ||
if err = s.getBootstrapInfo(ctx); err != nil { | ||
span.SetStatus(codes.Error, fmt.Sprintf("Could not get bootstrap info from the Collector: %s", err.Error())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still return an error here.
@@ -309,24 +310,27 @@ func initTelemetrySettings(logger *zap.Logger, cfg config.Telemetry) (telemetryS | |||
} | |||
|
|||
func (s *Supervisor) Start() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to play devil's advocate, do we always want a span for Supervisor startup? Most of these issues should be uncommon, and the call stack is fairly shallow (main -> (*Supervisor).Start
), so I think we either end up with an empty span, or one that more or less directly mirrors the log the Supervisor will print.
I don't have enough expertise with the Supervisor in production environments to really make a call as to whether this will be useful, so if anyone else has an opinion that would be welcome, but I want to make sure the telemetry we're emitting is useful.
# Conflicts: # cmd/opampsupervisor/e2e_test.go # cmd/opampsupervisor/supervisor/supervisor.go
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Description
Currently WIP, need buy-in from the codeowners for the related issue, but this should serve as a PoC of the suggested enhancement
This PR adds the emission of spans for the following events:
Link to tracking issue
Fixes #38724
Testing
Added assertions to the existing e2e tests to verify the expected spans