Skip to content

[translator/azurelogs] Rethink log structure, and define resource attributes #39186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
constanca-m opened this issue Apr 6, 2025 · 9 comments
Assignees
Labels
bug Something isn't working pkg/translator/azurelogs

Comments

@constanca-m
Copy link
Contributor

constanca-m commented Apr 6, 2025

Component(s)

pkg/translator/azurelogs

Describe the issue you're reporting

Consider this example log:

{
    "records": [
        {
            "time": "2024-04-24T12:06:12.0000000Z",
            "resourceId": "/firstResource",
            "category": "firstResource",
            "operationName": "firstResource"
        },
        {
            "time": "2024-04-24T12:06:12.0000000Z",
            "resourceId": "/firstResource",
            "category": "firstResource",
            "operationName": "firstResource"
        },
        {
            "time": "2024-04-24T12:06:12.0000000Z",
            "resourceId": "/secondResource",
            "category": "secondResource",
            "operationName": "secondResource"
        }
    ]
}

The expectation for this would be to have two resources: one for resource /firstResource, and the other for resource /secondResource. These resourceId would fit resource attributes best:

A resource represents the entity producing telemetry as resource attributes. [source]

Description: Describes the source of the log, aka resource. Multiple occurrences of events coming from the same event source can happen across time and they all have the same value of Resource. [...] Data formats that represent this data model may be designed in a manner that allows the Resource field to be recorded only once per batch of log records that come from the same source. [source]

Rather, the properties of these records should be log record attributes (these are not pictured in the example file above). Category and operation name could be as well resource attributes.

These are the produced logs:

resourceLogs:
  - resource: {}
    scopeLogs:
      - logRecords:
          - attributes:
              - key: cloud.resource_id
                value:
                  stringValue: /firstResource
              - key: cloud.provider
                value:
                  stringValue: azure
              - key: event.name
                value:
                  stringValue: az.resource.log
            body: [...]
            spanId: ""
            timeUnixNano: "1713960372000000000"
            traceId: ""
          - attributes:
              - key: cloud.resource_id
                value:
                  stringValue: /firstResource
              - key: cloud.provider
                value:
                  stringValue: azure
              - key: event.name
                value:
                  stringValue: az.resource.log
            body: [...]
            spanId: ""
            timeUnixNano: "1713960372000000000"
            traceId: ""
        scope:
          name: otelcol/azureresourcelogs
          version: 1.2.3
  - resource: {}
    scopeLogs:
      - logRecords:
          - attributes:
              - key: cloud.resource_id
                value:
                  stringValue: /secondResource
              - key: cloud.provider
                value:
                  stringValue: azure
              - key: event.name
                value:
                  stringValue: az.resource.log
            body: [...]
            spanId: ""
            timeUnixNano: "1713960372000000000"
            traceId: ""
        scope:
          name: otelcol/azureresourcelogs
          version: 1.2.3

Even though it is correct that we have two resources, these resources have empty attributes, and the attributes of the log record are the resource attributes.

The properties (placed inside the body) could then be the actual log record attributes. See for example the AWS VPC flow log to get a better idea.

At the moment, inside the body we have a kvlistValue field:

body:
  kvlistValue:
    values:
      - key: operation.name
        value:
          stringValue: Authorization
      - key: enduser.id
        value:
          stringValue: USER_ID
      - key: client.address
        value:
          stringValue: 42.42.42.42
      - key: network.protocol.name
        value:
          stringValue: kudu
      - key: properties
        value:
          kvlistValue:
            values:
              - key: UserDisplayName
                value:
                  stringValue: $fbehtestapp
      - key: category
        value:
          stringValue: AppServiceAuditLogs

Which maybe is unconventional, and would fit best in the log record attributes.

@constanca-m constanca-m added the needs triage New item requiring triage label Apr 6, 2025
@constanca-m
Copy link
Contributor Author

Pinging code owners: @atoulme, @cparkins, @MikeGoldsmith (as this didn't get done automatically).

@constanca-m
Copy link
Contributor Author

/label pkg/translator/azurelogs

Copy link
Contributor

github-actions bot commented Apr 8, 2025

Pinging code owners for pkg/translator/azurelogs: @atoulme @cparkins @MikeGoldsmith. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

@constanca-m
Copy link
Contributor Author

I think resources without attributes should be even considered a bug at this point. Resources with the same attributes (or in this case, without any attributes) are considered the same, and this is obviously not the case. You can find this issue manifesting in our unit tests log_maximum: we have two resources, but they can be placed in a different order in the expected file. Because the two resources are the same, we have an error that only happens sometimes:

Received unexpected error:
	            	the following errors occurred:
	            	 -  resource "map[]": scope "otelcol/azureresourcelogs": number of log records doesn't match expected: 1, actual: 2
	            	 -  resource "map[]": scope "otelcol/azureresourcelogs": number of log records doesn't match expected: 2, actual: 1
	Test:       	TestUnmarshalLogs_Files/log_maximum

See the logs comparison function for better understanding.

cc @atoulme @MikeGoldsmith @cparkins

@constanca-m
Copy link
Contributor Author

/label bug

@github-actions github-actions bot added the bug Something isn't working label Apr 12, 2025
@atoulme atoulme removed the needs triage New item requiring triage label Apr 14, 2025
@atoulme
Copy link
Contributor

atoulme commented Apr 14, 2025

@cparkins what's your take?

@cparkins
Copy link
Contributor

@atoulme I have no reason to disagree. But how logs should be structured is not something I have much of an educated opinion on. I think what is being proposed makes sense.

@atoulme
Copy link
Contributor

atoulme commented Apr 14, 2025

Thanks @cparkins @constanca-m take it away (and feel free to make yourself a codeowner)

@constanca-m
Copy link
Contributor Author

Thanks @atoulme , I will open a request for codeowner later this week.

I will wait for #39340 to be merged before starting to work on this, so that the number of PRs opened stays low.

atoulme pushed a commit that referenced this issue Apr 17, 2025
#### Description

This PR is only to improve performance, it does not change any
functionality or any output as you can see by the passing unit tests.

These are the main changes:

- Iterate over the azure logs only one time. Previously we had a slice
for the keys, and a map that store all logs corresponding to the same
resource id.
- Remove the map `mappings`. It is expensive to look up the field this
way. As an alternative, we now have a function that checks the field,
and adds that to the attribute. This function is used from the
beginning, as we know the category right away.
- Use config fastest for `jsoniter` and borrow iterator.

Results:
```
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/azurelogs
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                             │   before    │               this PR               │
                             │   sec/op    │   sec/op     vs base                │
UnmarshalLogs/1000_record-16   2.226m ± 2%   1.590m ± 6%  -28.56% (p=0.000 n=10)
UnmarshalLogs/1_record-16      2.890µ ± 1%   2.155µ ± 1%  -25.45% (p=0.000 n=10)
UnmarshalLogs/100_record-16    217.9µ ± 2%   155.4µ ± 1%  -28.69% (p=0.000 n=10)
geomean                        111.9µ        81.05µ       -27.58%

                             │    before    │               this PR                │
                             │     B/op     │     B/op      vs base                │
UnmarshalLogs/1000_record-16   2.093Mi ± 0%   1.293Mi ± 0%  -38.24% (p=0.000 n=10)
UnmarshalLogs/1_record-16      2.484Ki ± 0%   1.506Ki ± 0%  -39.39% (p=0.000 n=10)
UnmarshalLogs/100_record-16    216.0Ki ± 0%   144.9Ki ± 0%  -32.91% (p=0.000 n=10)
geomean                        104.8Ki        66.11Ki       -36.91%

                             │   before    │               this PR               │
                             │  allocs/op  │  allocs/op   vs base                │
UnmarshalLogs/1000_record-16   38.05k ± 0%   20.03k ± 0%  -47.36% (p=0.000 n=10)
UnmarshalLogs/1_record-16       52.00 ± 0%    31.00 ± 0%  -40.38% (p=0.000 n=10)
UnmarshalLogs/100_record-16    3.835k ± 0%   2.025k ± 0%  -47.20% (p=0.000 n=10)
geomean                        1.965k        1.079k       -45.07%

```

Performance increased in all metrics.

There are still improvements that can be done, but I will not add them
to this PR so it won't get too big:
- We should not carry an attributes map, but instead we should add them
to the record as soon as possible.

These issues might also get affected by
#39186
if it goes forward.

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Relates
#39119.

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Unit tests and benchmark.
atoulme pushed a commit that referenced this issue Apr 17, 2025
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PRs adds myself (@constanca-m) to the codeowners of the component.

Work done:
-
#39176
-
#39200

Work in progress:
-
#39340

Work planned:
-
#39186

The company I work at (Elastic) is relying on this component in one of
our products, so I plan to keep contributing to it. I hope that this
change will:
- Help and improve maintenance
- Prevent breaking changes without warning
- Help collaboration between codeowners from different companies.
akshays-19 pushed a commit to akshays-19/opentelemetry-collector-contrib that referenced this issue Apr 23, 2025
#### Description

This PR is only to improve performance, it does not change any
functionality or any output as you can see by the passing unit tests.

These are the main changes:

- Iterate over the azure logs only one time. Previously we had a slice
for the keys, and a map that store all logs corresponding to the same
resource id.
- Remove the map `mappings`. It is expensive to look up the field this
way. As an alternative, we now have a function that checks the field,
and adds that to the attribute. This function is used from the
beginning, as we know the category right away.
- Use config fastest for `jsoniter` and borrow iterator.

Results:
```
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/azurelogs
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                             │   before    │               this PR               │
                             │   sec/op    │   sec/op     vs base                │
UnmarshalLogs/1000_record-16   2.226m ± 2%   1.590m ± 6%  -28.56% (p=0.000 n=10)
UnmarshalLogs/1_record-16      2.890µ ± 1%   2.155µ ± 1%  -25.45% (p=0.000 n=10)
UnmarshalLogs/100_record-16    217.9µ ± 2%   155.4µ ± 1%  -28.69% (p=0.000 n=10)
geomean                        111.9µ        81.05µ       -27.58%

                             │    before    │               this PR                │
                             │     B/op     │     B/op      vs base                │
UnmarshalLogs/1000_record-16   2.093Mi ± 0%   1.293Mi ± 0%  -38.24% (p=0.000 n=10)
UnmarshalLogs/1_record-16      2.484Ki ± 0%   1.506Ki ± 0%  -39.39% (p=0.000 n=10)
UnmarshalLogs/100_record-16    216.0Ki ± 0%   144.9Ki ± 0%  -32.91% (p=0.000 n=10)
geomean                        104.8Ki        66.11Ki       -36.91%

                             │   before    │               this PR               │
                             │  allocs/op  │  allocs/op   vs base                │
UnmarshalLogs/1000_record-16   38.05k ± 0%   20.03k ± 0%  -47.36% (p=0.000 n=10)
UnmarshalLogs/1_record-16       52.00 ± 0%    31.00 ± 0%  -40.38% (p=0.000 n=10)
UnmarshalLogs/100_record-16    3.835k ± 0%   2.025k ± 0%  -47.20% (p=0.000 n=10)
geomean                        1.965k        1.079k       -45.07%

```

Performance increased in all metrics.

There are still improvements that can be done, but I will not add them
to this PR so it won't get too big:
- We should not carry an attributes map, but instead we should add them
to the record as soon as possible.

These issues might also get affected by
open-telemetry#39186
if it goes forward.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Relates
open-telemetry#39119.

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Unit tests and benchmark.
akshays-19 pushed a commit to akshays-19/opentelemetry-collector-contrib that referenced this issue Apr 23, 2025
…elemetry#39457)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PRs adds myself (@constanca-m) to the codeowners of the component.

Work done:
-
open-telemetry#39176
-
open-telemetry#39200

Work in progress:
-
open-telemetry#39340

Work planned:
-
open-telemetry#39186

The company I work at (Elastic) is relying on this component in one of
our products, so I plan to keep contributing to it. I hope that this
change will:
- Help and improve maintenance
- Prevent breaking changes without warning
- Help collaboration between codeowners from different companies.
Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this issue Apr 24, 2025
#### Description

This PR is only to improve performance, it does not change any
functionality or any output as you can see by the passing unit tests.

These are the main changes:

- Iterate over the azure logs only one time. Previously we had a slice
for the keys, and a map that store all logs corresponding to the same
resource id.
- Remove the map `mappings`. It is expensive to look up the field this
way. As an alternative, we now have a function that checks the field,
and adds that to the attribute. This function is used from the
beginning, as we know the category right away.
- Use config fastest for `jsoniter` and borrow iterator.

Results:
```
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/azurelogs
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                             │   before    │               this PR               │
                             │   sec/op    │   sec/op     vs base                │
UnmarshalLogs/1000_record-16   2.226m ± 2%   1.590m ± 6%  -28.56% (p=0.000 n=10)
UnmarshalLogs/1_record-16      2.890µ ± 1%   2.155µ ± 1%  -25.45% (p=0.000 n=10)
UnmarshalLogs/100_record-16    217.9µ ± 2%   155.4µ ± 1%  -28.69% (p=0.000 n=10)
geomean                        111.9µ        81.05µ       -27.58%

                             │    before    │               this PR                │
                             │     B/op     │     B/op      vs base                │
UnmarshalLogs/1000_record-16   2.093Mi ± 0%   1.293Mi ± 0%  -38.24% (p=0.000 n=10)
UnmarshalLogs/1_record-16      2.484Ki ± 0%   1.506Ki ± 0%  -39.39% (p=0.000 n=10)
UnmarshalLogs/100_record-16    216.0Ki ± 0%   144.9Ki ± 0%  -32.91% (p=0.000 n=10)
geomean                        104.8Ki        66.11Ki       -36.91%

                             │   before    │               this PR               │
                             │  allocs/op  │  allocs/op   vs base                │
UnmarshalLogs/1000_record-16   38.05k ± 0%   20.03k ± 0%  -47.36% (p=0.000 n=10)
UnmarshalLogs/1_record-16       52.00 ± 0%    31.00 ± 0%  -40.38% (p=0.000 n=10)
UnmarshalLogs/100_record-16    3.835k ± 0%   2.025k ± 0%  -47.20% (p=0.000 n=10)
geomean                        1.965k        1.079k       -45.07%

```

Performance increased in all metrics.

There are still improvements that can be done, but I will not add them
to this PR so it won't get too big:
- We should not carry an attributes map, but instead we should add them
to the record as soon as possible.

These issues might also get affected by
open-telemetry#39186
if it goes forward.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Relates
open-telemetry#39119.

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Unit tests and benchmark.
Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this issue Apr 24, 2025
…elemetry#39457)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PRs adds myself (@constanca-m) to the codeowners of the component.

Work done:
-
open-telemetry#39176
-
open-telemetry#39200

Work in progress:
-
open-telemetry#39340

Work planned:
-
open-telemetry#39186

The company I work at (Elastic) is relying on this component in one of
our products, so I plan to keep contributing to it. I hope that this
change will:
- Help and improve maintenance
- Prevent breaking changes without warning
- Help collaboration between codeowners from different companies.
atoulme pushed a commit that referenced this issue May 2, 2025
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

The issue for this PR was first reported here:
#39186.

Currently, the resource logs have no attributes. This means that all
records are supposedly the same, which is not true.

This PR adds attributes to the resources.

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Relates
#39186.

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Unit tests fixed.
vincentfree pushed a commit to ing-bank/opentelemetry-collector-contrib that referenced this issue May 6, 2025
…etry#39571)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

The issue for this PR was first reported here:
open-telemetry#39186.

Currently, the resource logs have no attributes. This means that all
records are supposedly the same, which is not true.

This PR adds attributes to the resources.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Relates
open-telemetry#39186.

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Unit tests fixed.
vincentfree pushed a commit to ing-bank/opentelemetry-collector-contrib that referenced this issue May 20, 2025
…etry#39571)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

The issue for this PR was first reported here:
open-telemetry#39186.

Currently, the resource logs have no attributes. This means that all
records are supposedly the same, which is not true.

This PR adds attributes to the resources.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Relates
open-telemetry#39186.

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Unit tests fixed.
dragonlord93 pushed a commit to dragonlord93/opentelemetry-collector-contrib that referenced this issue May 23, 2025
…etry#39571)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

The issue for this PR was first reported here:
open-telemetry#39186.

Currently, the resource logs have no attributes. This means that all
records are supposedly the same, which is not true.

This PR adds attributes to the resources.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Relates
open-telemetry#39186.

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Unit tests fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg/translator/azurelogs
Projects
None yet
Development

No branches or pull requests

3 participants