-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[chore] [pkg/stanza] test: add benchmark for File input #38054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
djaglowski
merged 5 commits into
open-telemetry:main
from
andrzej-stencel:add-benchmark-for-file-input
Feb 20, 2025
Merged
[chore] [pkg/stanza] test: add benchmark for File input #38054
djaglowski
merged 5 commits into
open-telemetry:main
from
andrzej-stencel:add-benchmark-for-file-input
Feb 20, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
532cbb0
to
234b9b3
Compare
djaglowski
reviewed
Feb 19, 2025
To reuse it in File input benchmark.
djaglowski
approved these changes
Feb 20, 2025
andrzej-stencel
added a commit
to andrzej-stencel/opentelemetry-collector-contrib
that referenced
this pull request
Feb 25, 2025
Related to open-telemetry#38054. The File Log receiver benchmark has a scope that is larger than both the [FIle consumer benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/fileconsumer/benchmark_test.go) and the [File input benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/input/file/benchmark_test.go). Compared to the File input benchmark, the File Log receiver benchmark includes: - translating of Stanza entries to pdata logs ([converter.ConvertEntries](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/adapter/converter.go#L20)). - batching of logs in [LogEmitter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/helper/emitter.go#L103). This new benchmark should be useful when comparing performance with and without batching in LogEmitter, see open-telemetry#35456.
andrzej-stencel
added a commit
that referenced
this pull request
Mar 3, 2025
#### Description Modifies the File consumer to emit logs in batches as opposed to sending each log individually through the Stanza pipeline and on to the Log Emitter. Here are the changes introduced: - 6b4c9fe Changed the `Reader::ReadToEnd` method in File consumer to collect the tokens scanned from the file into batches. At this point, the Reader still emits each token individually, as the `emit.Callback` function only accepts a single token. - c206995 Changed `emit.Callback` function signature to accept a slice of tokens as opposed to a single token, and changed the Reader to emit a batch of tokens in one request. At this point, the batches are still split into individual tokens inside the `emit` function, because the Stanza operators can only process one entry at a time. - aedda3a Added `ProcessBatch` method to Stanza operators and used it in the `emit` function. At this point, the batch of tokens is translated to a batch of entries and passed to Log Emitter as a whole batch. The batch is still split in the Log Emitter, which calls `consumeFunc` for each entry in a loop. - 13d6054 Changed the LogEmitter to add the whole batch to its buffer, as opposed to adding entries one by one. **Slice of entries `[]entry.Entry` vs. slice of pointers `[]*entry.Entry`** I considered whether the `ProcessBatch` method in the `Operator` interface should accept a slice of structs `[]entry.Entry` or a slice of pointers `[]*entry.Entry`. I ran some tests (similar to #35454) and they showed a 7-10% performance loss when using a slice of structs vs. a slice of pointers. That's why I decided to use the slice of pointers `[]*entry.Entry`. #### Link to tracking issue - Fixes #35455 #### Testing No changes in tests. The goal is for the functionality to not change and for performance to not decrease. I have added a new benchmark in a separate PR #38054 that should be helpful in assessing the performance impact of this change. #### Documentation These are internal changes, no user documentation needs changing.
djaglowski
pushed a commit
that referenced
this pull request
Mar 3, 2025
…38171) #### Description Related to #38054. The File Log receiver benchmark has a scope that is larger than both the [File consumer benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/fileconsumer/benchmark_test.go) and the [File input benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/input/file/benchmark_test.go). Compared to the File input benchmark, the scope of File Log receiver benchmark includes: - translating of Stanza entries to pdata logs ([converter.ConvertEntries](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/adapter/converter.go#L20)). - batching of logs in [LogEmitter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/helper/emitter.go#L103). This new benchmark should help us measure the performance impact of [removing batching from LogEmitter](#35456) after it is [added in File consumer](#35455). #### Link to tracking issue - Needed for #35456
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This benchmark has a scope that sits between the Stanza File consumer benchmark and a File Log receiver benchmark (which currently does not exist).
The major difference between the File consumer benchmark and the File input benchmark is that the File input benchmark includes measuring of the memory allocations made in the File input's emit function.
This should allow to assess performance impact of this change: #37734, or any similar changes in the future.