-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[exporter/elasticsearch] report connection health via componentstatus #39562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/elasticsearch] report connection health via componentstatus #39562
Conversation
…y-collector-contrib into elasticsearch-exporter-componentstatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Using LogRoundTrip for health reporting is a bit hacky, but I reckon it is the easiest and the best way to achieve the goal because response parsing is done in go-docappender and there are no hooks.
} else if resp.StatusCode >= 300 { | ||
// Error results | ||
err := fmt.Errorf("Elasticsearch request failed: %v", resp.Status) | ||
componentstatus.ReportStatus( | ||
cl.componentHost, componentstatus.NewRecoverableErrorEvent(err)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these recoverable? I would think a 401 would be non-recoverable without human intervention.
409s are not an error at all necessarily, when documents are truly duplicates. Is it worth special casing those? There should probably be an indicator they are happening but if someone explicitly is doing de-duplication via an _id in the document this will mark the exporter as degraded unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"non-recoverable" in the collector is very unforgiving, to the extent that almost nothing we could encounter truly satisfies its definition (which is that the component can never again return to a healthy state no matter what for the life of the process -- so its real meaning isn't "requires human intervention" it's "requires intervention and a restart of the process"). For example I've seen some setups where ingest credentials are synced to the ingest workers before they're actually activated upstream (I don't know why, but it shouldn't break us), or similarly a 401 can be a side effect of broken or unreliable proxy settings that can be resolved without restarting the client process.
Good point about 409s though, those should be telemetry numbers rather than a component error state, I will add a special case for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just special casing 409s makes sense to me then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good, probably worth a test case for 409s specifically since they are now meant not to trigger recovery from an error state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test!
…y-collector-contrib into elasticsearch-exporter-componentstatus
Please address CI |
…y-collector-contrib into elasticsearch-exporter-componentstatus
Maybe I'm not understanding how to sync module versions -- "go get" on |
…y-collector-contrib into elasticsearch-exporter-componentstatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…open-telemetry#39562) #### Description Report component status of the Elasticsearch exporter based on the response code when making ingestion requests. #### Testing Added unit test confirming status is reported on http response. Also tested manually with the collector and confirmed that the error conditions appear when querying collector health status. --------- Co-authored-by: Andrzej Stencel <[email protected]>
Description
Report component status of the Elasticsearch exporter based on the response code when making ingestion requests.
Testing
Added unit test confirming status is reported on http response. Also tested manually with the collector and confirmed that the error conditions appear when querying collector health status.