[chore] [pkg/stanza] test: add benchmark for File input #38054

andrzej-stencel · 2025-02-19T12:13:35Z

This benchmark has a scope that sits between the Stanza File consumer benchmark and a File Log receiver benchmark (which currently does not exist).

The major difference between the File consumer benchmark and the File input benchmark is that the File input benchmark includes measuring of the memory allocations made in the File input's emit function.

This should allow to assess performance impact of this change: #37734, or any similar changes in the future.

pkg/stanza/operator/input/file/testdata/logs-1-lines.log

To reuse it in File input benchmark.

Related to open-telemetry#38054. The File Log receiver benchmark has a scope that is larger than both the [FIle consumer benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/fileconsumer/benchmark_test.go) and the [File input benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/input/file/benchmark_test.go). Compared to the File input benchmark, the File Log receiver benchmark includes: - translating of Stanza entries to pdata logs ([converter.ConvertEntries](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/adapter/converter.go#L20)). - batching of logs in [LogEmitter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/helper/emitter.go#L103). This new benchmark should be useful when comparing performance with and without batching in LogEmitter, see open-telemetry#35456.

#### Description Modifies the File consumer to emit logs in batches as opposed to sending each log individually through the Stanza pipeline and on to the Log Emitter. Here are the changes introduced: - 6b4c9fe Changed the `Reader::ReadToEnd` method in File consumer to collect the tokens scanned from the file into batches. At this point, the Reader still emits each token individually, as the `emit.Callback` function only accepts a single token. - c206995 Changed `emit.Callback` function signature to accept a slice of tokens as opposed to a single token, and changed the Reader to emit a batch of tokens in one request. At this point, the batches are still split into individual tokens inside the `emit` function, because the Stanza operators can only process one entry at a time. - aedda3a Added `ProcessBatch` method to Stanza operators and used it in the `emit` function. At this point, the batch of tokens is translated to a batch of entries and passed to Log Emitter as a whole batch. The batch is still split in the Log Emitter, which calls `consumeFunc` for each entry in a loop. - 13d6054 Changed the LogEmitter to add the whole batch to its buffer, as opposed to adding entries one by one. **Slice of entries `[]entry.Entry` vs. slice of pointers `[]*entry.Entry`** I considered whether the `ProcessBatch` method in the `Operator` interface should accept a slice of structs `[]entry.Entry` or a slice of pointers `[]*entry.Entry`. I ran some tests (similar to #35454) and they showed a 7-10% performance loss when using a slice of structs vs. a slice of pointers. That's why I decided to use the slice of pointers `[]*entry.Entry`. #### Link to tracking issue - Fixes #35455 #### Testing No changes in tests. The goal is for the functionality to not change and for performance to not decrease. I have added a new benchmark in a separate PR #38054 that should be helpful in assessing the performance impact of this change. #### Documentation These are internal changes, no user documentation needs changing.

…38171) #### Description Related to #38054. The File Log receiver benchmark has a scope that is larger than both the [File consumer benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/fileconsumer/benchmark_test.go) and the [File input benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/input/file/benchmark_test.go). Compared to the File input benchmark, the scope of File Log receiver benchmark includes: - translating of Stanza entries to pdata logs ([converter.ConvertEntries](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/adapter/converter.go#L20)). - batching of logs in [LogEmitter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/helper/emitter.go#L103). This new benchmark should help us measure the performance impact of [removing batching from LogEmitter](#35456) after it is [added in File consumer](#35455). #### Link to tracking issue - Needed for #35456

andrzej-stencel requested review from djaglowski and a team as code owners February 19, 2025 12:13

github-actions bot assigned dmitryax Feb 19, 2025

github-actions bot added the pkg/stanza label Feb 19, 2025

[chore] test: add benchmark for Stanza File input

234b9b3

andrzej-stencel force-pushed the add-benchmark-for-file-input branch from 532cbb0 to 234b9b3 Compare February 19, 2025 12:34

andrzej-stencel mentioned this pull request Feb 19, 2025

Remove unnecessary copy while decoding and constructing string #37734

Merged

djaglowski reviewed Feb 19, 2025

View reviewed changes

pkg/stanza/operator/input/file/testdata/logs-1-lines.log Outdated Show resolved Hide resolved

andrzej-stencel added 3 commits February 20, 2025 13:08

refactor: move filetest internal package out of fileconsumer package

240a675

To reuse it in File input benchmark.

test: generate log files on the fly

5d4b3f0

fix: lint

b896806

andrzej-stencel requested a review from djaglowski February 20, 2025 12:15

make goporto

3457eaa

andrzej-stencel mentioned this pull request Feb 20, 2025

[pkg/stanza] Introduce batching logs in File consumer #36663

Merged

djaglowski approved these changes Feb 20, 2025

View reviewed changes

djaglowski merged commit 0f88dfa into open-telemetry:main Feb 20, 2025
162 checks passed

github-actions bot added this to the next release milestone Feb 20, 2025

andrzej-stencel mentioned this pull request Feb 25, 2025

[chore] [receiver/filelog] test: add benchmark for File Log receiver #38171

Merged

andrzej-stencel deleted the add-benchmark-for-file-input branch February 25, 2025 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[chore] [pkg/stanza] test: add benchmark for File input #38054

[chore] [pkg/stanza] test: add benchmark for File input #38054

Uh oh!

andrzej-stencel commented Feb 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[chore] [pkg/stanza] test: add benchmark for File input #38054

[chore] [pkg/stanza] test: add benchmark for File input #38054

Uh oh!

Conversation

andrzej-stencel commented Feb 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!