Skip to content

[extension/awslogsencoding] Improve performance on text VPC flow logs #39043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
constanca-m opened this issue Mar 29, 2025 · 3 comments · Fixed by #39090
Closed

[extension/awslogsencoding] Improve performance on text VPC flow logs #39043

constanca-m opened this issue Mar 29, 2025 · 3 comments · Fixed by #39090

Comments

@constanca-m
Copy link
Contributor

Component(s)

extension/encoding/awslogsencoding

Is your feature request related to a problem? Please describe.

Improve performance on how we are handling VPC flow logs.

Describe the solution you'd like

Improve performance by:

For a followup:

Unless there's a bug in the parser, the number of fields should always match, so let's optimise for that. I suggest we remove this check here and count the fields as we're parsing to reduce overhead. Then at the very end we can return an error if the number of fields parsed does not match expectations.

Originally posted by @axw in #38897 (comment)

https://pkg.go.dev/strings#Cut should simplify this, something like:

for _, field := range fields {
	if logLine == "" {
		// handle field count mismatch
	}
	var value string
	value, logLine, _ = strings.Cut(logLine, " ")
}

Let's do that in a followup though.

Originally posted by @axw in #38897 (comment)

Describe alternatives you've considered

No response

Additional context

No response

@constanca-m constanca-m added enhancement New feature or request needs triage New item requiring triage labels Mar 29, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@axw
Copy link
Contributor

axw commented Mar 30, 2025

Another couple of things to look into:

  • Instead of collecting *srcaddr/*dstaddr addresses in a map, create a struct with the four possible fields
  • VPC flow logs are already aggregated, so further grouping by resource may not be worth the cost

@axw
Copy link
Contributor

axw commented Mar 30, 2025

/label -needs-triage

@github-actions github-actions bot removed the needs triage New item requiring triage label Mar 30, 2025
@songy23 songy23 closed this as completed in fad06e3 Apr 2, 2025
dmathieu pushed a commit to dmathieu/opentelemetry-collector-contrib that referenced this issue Apr 8, 2025
…etry#39090)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

- Use `strings.Cut` instead of `strings.Index`.
- Use structs instead of maps for the addresses fields.
- Remove the grouping: all logs from the same file belong to the same
account and region.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes
open-telemetry#39043

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Unit tests and benchmarking.

Benchmark improved:

```
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/extension/encoding/awslogsencodingextension/internal/unmarshaler/vpc-flow-log
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                                             │   Upstream   │             PR changes              │
                                             │    sec/op    │   sec/op     vs base                │
UnmarshalUnmarshalPlainTextLogs/1_log-16       2.069µ ± 11%   1.853µ ± 0%  -10.46% (p=0.000 n=10)
UnmarshalUnmarshalPlainTextLogs/1000_logs-16   570.6µ ±  2%   460.0µ ± 4%  -19.38% (p=0.000 n=10)
geomean                                        34.36µ         29.19µ       -15.04%

                                             │   Upstream   │             PR changes              │
                                             │     B/op     │     B/op      vs base               │
UnmarshalUnmarshalPlainTextLogs/1_log-16       5.398Ki ± 0%   5.242Ki ± 0%  -2.89% (p=0.000 n=10)
UnmarshalUnmarshalPlainTextLogs/1000_logs-16   537.6Ki ± 0%   490.6Ki ± 0%  -8.74% (p=0.000 n=10)
geomean                                        53.87Ki        50.72Ki       -5.86%

                                             │   Upstream   │             PR changes              │
                                             │  allocs/op   │  allocs/op   vs base                │
UnmarshalUnmarshalPlainTextLogs/1_log-16         27.00 ± 0%    24.00 ± 0%  -11.11% (p=0.000 n=10)
UnmarshalUnmarshalPlainTextLogs/1000_logs-16   10.027k ± 0%   9.025k ± 0%   -9.99% (p=0.000 n=10)
geomean                                          520.3         465.4       -10.55%
```

<!--Describe the documentation added.-->
#### Documentation

I have added comments in the code.

<!--Please delete paragraphs that you did not use before submitting.-->
Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this issue Apr 24, 2025
…etry#39090)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

- Use `strings.Cut` instead of `strings.Index`.
- Use structs instead of maps for the addresses fields.
- Remove the grouping: all logs from the same file belong to the same
account and region.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes
open-telemetry#39043

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Unit tests and benchmarking.

Benchmark improved:

```
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/extension/encoding/awslogsencodingextension/internal/unmarshaler/vpc-flow-log
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                                             │   Upstream   │             PR changes              │
                                             │    sec/op    │   sec/op     vs base                │
UnmarshalUnmarshalPlainTextLogs/1_log-16       2.069µ ± 11%   1.853µ ± 0%  -10.46% (p=0.000 n=10)
UnmarshalUnmarshalPlainTextLogs/1000_logs-16   570.6µ ±  2%   460.0µ ± 4%  -19.38% (p=0.000 n=10)
geomean                                        34.36µ         29.19µ       -15.04%

                                             │   Upstream   │             PR changes              │
                                             │     B/op     │     B/op      vs base               │
UnmarshalUnmarshalPlainTextLogs/1_log-16       5.398Ki ± 0%   5.242Ki ± 0%  -2.89% (p=0.000 n=10)
UnmarshalUnmarshalPlainTextLogs/1000_logs-16   537.6Ki ± 0%   490.6Ki ± 0%  -8.74% (p=0.000 n=10)
geomean                                        53.87Ki        50.72Ki       -5.86%

                                             │   Upstream   │             PR changes              │
                                             │  allocs/op   │  allocs/op   vs base                │
UnmarshalUnmarshalPlainTextLogs/1_log-16         27.00 ± 0%    24.00 ± 0%  -11.11% (p=0.000 n=10)
UnmarshalUnmarshalPlainTextLogs/1000_logs-16   10.027k ± 0%   9.025k ± 0%   -9.99% (p=0.000 n=10)
geomean                                          520.3         465.4       -10.55%
```

<!--Describe the documentation added.-->
#### Documentation

I have added comments in the code.

<!--Please delete paragraphs that you did not use before submitting.-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants