Skip to content

[exporter/prometheusremotewrite] WAL metrics #39556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ArthurSens opened this issue Apr 22, 2025 · 7 comments
Open

[exporter/prometheusremotewrite] WAL metrics #39556

ArthurSens opened this issue Apr 22, 2025 · 7 comments

Comments

@ArthurSens
Copy link
Member

Component(s)

exporter/prometheusremotewrite

Is your feature request related to a problem? Please describe.

It's a bit difficult to observe how well the Prometheus Remote Write exporter's WAL is performing while running in production

Describe the solution you'd like

It would be awesome if we had a few metrics to measure it's efficiency:

  • Data throughput
  • Write/Read success ratio
  • Write/Read latency
  • Pipeline lag

Describe alternatives you've considered

No response

Additional context

No response

@ArthurSens ArthurSens added enhancement New feature or request needs triage New item requiring triage labels Apr 22, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil @dashpole @ArthurSens. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

@NickAnge
Copy link
Contributor

Hey, I would like to work on this issue.

@ArthurSens
Copy link
Member Author

Awesome, thanks @NickAnge! Feel free to ping us if you need help :)

@NickAnge
Copy link
Contributor

NickAnge commented May 4, 2025

Hey @ArthurSens . Its still in early stage, but I wanted to ask your opinion about the code structure. I followed a similar approach to the generated_telemetry.go. Created an interface to wrap wal telemetry functions, but I feel that I am changing the wal structure a lot (for example by adding exporter setting (here)).

The metric generation seems straightforward. I have added the Writes (Total/failures) as an example

@ArthurSens
Copy link
Member Author

Hey @ArthurSens . Its still in early stage, but I wanted to ask your opinion about the code structure. I followed a similar approach to the generated_telemetry.go. Created an interface to wrap wal telemetry functions, but I feel that I am changing the wal structure a lot (for example by adding exporter setting (here)).

I had the same thing in mind, I don't see any problem in changing the signature of a private function :)

The metric generation seems straightforward. I have added the Writes (Total/failures) as an example

We can do one PR per metric if you prefer, it will also make the review easier, to be honest!

@NickAnge
Copy link
Contributor

NickAnge commented May 5, 2025

Thanks. I will make it per type of operation so its easier to review. Hopefully will get to it by the end of the week

songy23 pushed a commit that referenced this issue May 20, 2025
#39843)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR introduces Write and Write failures metric from which we can
derive the success ratio or the failure ratio.
- `otelcol_exporter_prometheusremotewrite_wal_writes`: Total WAL writes
requests
- `otelcol_exporter_prometheusremotewrite_wal_writes_failures`: Total
WAL write failures


I decided to introduce the code in the handle export function of the
exporter , just before calling the `wal.persistToWAL`

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Part of  #39556 

<!--Describe what testing was performed and which tests were added.-->
#### Testing
[
![Screenshot 2025-05-09 at 15 34
45](https://github.com/user-attachments/assets/4489b13a-a538-40ef-9ff7-de6d9f23290a)
](url)


<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Arthur Silva Sens <[email protected]>
dragonlord93 pushed a commit to dragonlord93/opentelemetry-collector-contrib that referenced this issue May 23, 2025
open-telemetry#39843)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR introduces Write and Write failures metric from which we can
derive the success ratio or the failure ratio.
- `otelcol_exporter_prometheusremotewrite_wal_writes`: Total WAL writes
requests
- `otelcol_exporter_prometheusremotewrite_wal_writes_failures`: Total
WAL write failures


I decided to introduce the code in the handle export function of the
exporter , just before calling the `wal.persistToWAL`

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Part of  open-telemetry#39556 

<!--Describe what testing was performed and which tests were added.-->
#### Testing
[
![Screenshot 2025-05-09 at 15 34
45](https://github.com/user-attachments/assets/4489b13a-a538-40ef-9ff7-de6d9f23290a)
](url)


<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Arthur Silva Sens <[email protected]>
atoulme pushed a commit that referenced this issue May 29, 2025
…#40272)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR introduces below metrics for WAL:

- `otelcol_exporter_prometheusremotewrite_wal_reads`: Number of WAL
reads
- `otelcol_exporter_prometheusremotewrite_wal_reads_failures`: Number of
WAL reads that failed

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Part of #39556 

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Added a UT `TestWALRead_Telemetry`. During WAL startup it tries to read
from the WAL but it fails because there is nothing to read. For that
reason both metrics have failed

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Arthur Silva Sens <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

2 participants