-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[consumererror] Add OTLP-centric error type #13042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[consumererror] Add OTLP-centric error type #13042
Conversation
Codecov ReportAttention: Patch coverage is
❌ Your patch check has failed because the patch coverage (90.69%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #13042 +/- ##
==========================================
- Coverage 91.48% 91.25% -0.23%
==========================================
Files 506 510 +4
Lines 28557 28830 +273
==========================================
+ Hits 26125 26310 +185
- Misses 1917 2002 +85
- Partials 515 518 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I'll look at improving the code coverage tomorrow. In the meantime, this should be in a pretty good state. |
The remaining functions missing test coverage are the status code conversion functions, which are pretty direct. I don't think tests are very helpful since the functions are pretty direct mappings. The only thing I can think of that would meaningfully improve coverage is to store the mappings in a map object as opposed to in a switch statement, but feels like a slightly worse implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so excited to see this revived
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seemed to be consensus on the last iteration on this implementation, I think what we need now is to test this in real life, thus I am approving this so we can move forward
Since this was specially controversial last time, I suggest we wait either until we have more approvals (I suggest 4) or some time has passed (I would suggest Friday next week). cc @open-telemetry/collector-approvers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, liked the idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good to see this moving forward. This looks the way I would expect it to look after reviewing earlier feedback from @bogdandrutu.
// data around the error that occurred. | ||
// | ||
// Error should be obtained from a given `error` object using `errors.As`. | ||
type Error struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another property that is interesting, "retry-after". Not suggesting to fix it now, but I am curious how do we plan to support that? Only via grpc Status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will likely add another error type that will go along with this one, but if not we will add an option to this error type that also includes a timer for how long the caller should wait before retrying a request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a problem with gRPC status. The gRPC status carries more informations in the "details" part (including the retry-after) and will be error prone to create this Error from status, then multiple other errors for other parts of the status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details, I wasn't aware the gRPC status could contain that, but I found the RetryInfo type, which is what I believe you are talking about.
I believe we'll likely solve this through something like the following that has error creation go through a single point.
- Have the constructors take options that allow specifying the retry delay. The constructors will either return an Error struct that contains a
RetryInfo
or similar struct that can be pulled out witherrors.As
, or can be placed directly on theError
object itself. - Have the gRPC constructor extract the retry info from the
*status.Status
struct and use that info to populate retry info.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction, and glad that you know about this now and can propose a solution :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a big problem we identified in the past, which is that the default behavior of the errors in the collector pipelines is that they are retryable. It seems that this PR changes that, which I 1000% support, but we need to make sure we document this change and analyze the impact of that.
Agreed. That will come in a follow-up once we start to use this, I will make sure we proceed discerningly here. |
// See https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md for more details. | ||
// | ||
// If a gRPC code cannot be derived from these three sources then INTERNAL is returned. | ||
func (e *Error) OTLPGRPCStatus() *status.Status { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually change the signature to something like:
func ToGRPCStatus(e error) *status.Status;
This way:
- Can handle the "error.As" part.
- Can handle extra details like retry-after you proposed with the constructor.
|
||
// NewOTLPGRPCError records a gRPC status code that was received from a server | ||
// during data submission. | ||
func NewOTLPGRPCError(origErr error, status *status.Status) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an "original" error in this case?
// data around the error that occurred. | ||
// | ||
// Error should be obtained from a given `error` object using `errors.As`. | ||
type Error struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction, and glad that you know about this now and can propose a solution :)
Description
Continuation of #11085.
Link to tracking issue
Fixes #7047