-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Revisit how collector internal metric are distributed across telemetry levels #7890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From collector wg/sig:
I'm strongly in favor of better utilizing verbosity levels |
Agree. I was testing the difference between these levels and just found the same, that the detailed level logs has one more metric than others (namely |
According to the proposed guidelines, I think we can move
These metrics were not emitted before the transition to OTel instrumentation and can be pretty noisy even with enabled
Enabling them on the |
@dmitryax I agree that component metrics should be emitted with For the GRPC/HTTP server/client metrics, are they broken down per component? If so, they feels like Side note about |
Not every component exposes them, only receivers and exporters with HTTP/GRPC clients/servers, but it can be more granular than per component. Client metrics are per I would agree to move HTTP/GRPC client/server metrics to
There is nothing like that available now. Users can further reduce the set with filter processor or with |
Out of scope for this issue for sure, but I agree. |
**Description:** This change distributes the reported internal metrics across available levels and updates the level set by default: 1. The default level is changed from `basic` to `normal`, which can be overridden with `service::telmetry::metrics::level` configuration. 2. The following batch processor metrics are updated to be reported starting from `normal` level instead of `basic` level: - `processor_batch_batch_send_size` - `processor_batch_metadata_cardinality` - `processor_batch_timeout_trigger_send` - `processor_batch_size_trigger_send` 3. The following GRPC/HTTP server and client metrics are updated to be reported starting from `detailed` level: - `http.client.*` metrics - `http.server.*` metrics - `rpc.server.*` metrics - `rpc.client.*` metrics **Link to tracking Issue:** #7890
What is the status of this issue? What is left to decide? |
Discussed on 2024-12-16 meeting, Pablo will look into adding this on the component guidelines and @dmitryax will create a follow up issue |
Looking into the current state of things, it looks like there are still some pending issues. Here are the changes I suggest:
|
I discussed this offline with Jade, I believe I mean the "component stability" document we have in the docs folder in this repository. Regarding the list above, my only comment was that we should think about the migration, possibly with a feature gate for all telemetry related changes |
…l guidelines (#12525) #### Description This PR: - requires "level: normal" before outputting batch processor metrics (in addition to one specific metric which was already restricted to "level: detailed") - clarifies wording in the telemetry level guidelines and documentation, and adds said guidelines to the requirements for stable components. Some rationale for these changes can be found in the tracking issue and [this comment](#7890 (comment)). #### Link to tracking issue Resolves #7890 #### To be discussed Should we add a feature gate for this, in case a user relies on "level: basic" outputting batch processor metrics? This feels like a niche use case, so considering the "alpha" stability level of these metrics, I don't think it's really necessary. Considering batch processor metrics had already been switched to "normal" once (#9767), but were turned back to basic at some later point (not sure when), we might also want to add tests to avoid further regressions (especially as the handling of telemetry levels is bound to change further with #11754). --------- Co-authored-by: Dmitrii Anoshin <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.
There are several verbosity levels that can be used to configure how many metrics the collector exposes:
The problem is that they are barely used. Most of the metrics are exposed at the
basic
level. There is only one metric in batch processor exposed at thedetailed
level. Thenormal
level is not being used. The suggestion is to revisit all the metrics and further distribute them across the levels.The default level is
basic
(the lowest), which is the most common and provides most of the metrics. We can move a significant portion of the metrics to thenormal
level, which can become a new default. So default behavior doesn't change for the end user. Whilebasic
level can be kept to the bare minimum reserved for collector core:OTel components metrics will use only normal or detailed levels.
Long-term, we can consider providing a user an option to override this level per component.
The text was updated successfully, but these errors were encountered: