Skip to content

[chore] [pkg/stanza] test: add benchmark for File input #38054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

andrzej-stencel
Copy link
Member

This benchmark has a scope that sits between the Stanza File consumer benchmark and a File Log receiver benchmark (which currently does not exist).

The major difference between the File consumer benchmark and the File input benchmark is that the File input benchmark includes measuring of the memory allocations made in the File input's emit function.

This should allow to assess performance impact of this change: #37734, or any similar changes in the future.

@djaglowski djaglowski merged commit 0f88dfa into open-telemetry:main Feb 20, 2025
162 checks passed
@github-actions github-actions bot added this to the next release milestone Feb 20, 2025
andrzej-stencel added a commit to andrzej-stencel/opentelemetry-collector-contrib that referenced this pull request Feb 25, 2025
@andrzej-stencel andrzej-stencel deleted the add-benchmark-for-file-input branch February 25, 2025 11:59
andrzej-stencel added a commit that referenced this pull request Mar 3, 2025
#### Description

Modifies the File consumer to emit logs in batches as opposed to sending
each log individually through the Stanza pipeline and on to the Log
Emitter.

Here are the changes introduced:

-
6b4c9fe
Changed the `Reader::ReadToEnd` method in File consumer to collect the
tokens scanned from the file into batches. At this point, the Reader
still emits each token individually, as the `emit.Callback` function
only accepts a single token.
-
c206995
Changed `emit.Callback` function signature to accept a slice of tokens
as opposed to a single token, and changed the Reader to emit a batch of
tokens in one request. At this point, the batches are still split into
individual tokens inside the `emit` function, because the Stanza
operators can only process one entry at a time.
-
aedda3a
Added `ProcessBatch` method to Stanza operators and used it in the
`emit` function. At this point, the batch of tokens is translated to a
batch of entries and passed to Log Emitter as a whole batch. The batch
is still split in the Log Emitter, which calls `consumeFunc` for each
entry in a loop.
-
13d6054
Changed the LogEmitter to add the whole batch to its buffer, as opposed
to adding entries one by one.

**Slice of entries `[]entry.Entry` vs. slice of pointers
`[]*entry.Entry`**

I considered whether the `ProcessBatch` method in the `Operator`
interface should accept a slice of structs `[]entry.Entry` or a slice of
pointers `[]*entry.Entry`. I ran some tests (similar to
#35454)
and they showed a 7-10% performance loss when using a slice of structs
vs. a slice of pointers. That's why I decided to use the slice of
pointers `[]*entry.Entry`.

#### Link to tracking issue

- Fixes
#35455

#### Testing

No changes in tests. The goal is for the functionality to not change and
for performance to not decrease.

I have added a new benchmark in a separate PR
#38054
that should be helpful in assessing the performance impact of this
change.

#### Documentation

These are internal changes, no user documentation needs changing.
djaglowski pushed a commit that referenced this pull request Mar 3, 2025
…38171)

#### Description

Related to
#38054.

The File Log receiver benchmark has a scope that is larger than both the
[File consumer
benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/fileconsumer/benchmark_test.go)
and the [File input
benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/input/file/benchmark_test.go).
Compared to the File input benchmark, the scope of File Log receiver
benchmark includes:
- translating of Stanza entries to pdata logs
([converter.ConvertEntries](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/adapter/converter.go#L20)).
- batching of logs in
[LogEmitter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/helper/emitter.go#L103).

This new benchmark should help us measure the performance impact of
[removing batching from
LogEmitter](#35456)
after it is [added in File
consumer](#35455).

#### Link to tracking issue

- Needed for
#35456
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants