Skip to content

[pkg/stanza] Adopt semantic convention for the log file path attribute #37210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

douglascamata
Copy link
Member

@douglascamata douglascamata commented Jan 14, 2025

Description

This PR adopts the semantic convention for the log file path attribute, which should be attributes["log.file.path"].

It fixes the default value for the recombine operator's source_identifier.

@douglascamata douglascamata requested review from djaglowski and a team as code owners January 14, 2025 11:37
@douglascamata douglascamata changed the title Fix docs for recombine operator [docs][pkg/stanza] Fix docs for recombine operator Jan 14, 2025
@@ -17,7 +17,7 @@ The `recombine` operator combines consecutive logs into single logs based on sim
| `max_unmatched_batch_size` | 100 | The maximum number of consecutive entries that will be combined into a single entry before the match occurs (with `is_first_entry` or `is_last_entry`), e.g. `max_unmatched_batch_size=0` - all entries combined, `max_unmatched_batch_size=1` - all entries uncombined until the match occurs, `max_unmatched_batch_size=100` - entries combined into 100-entry-packages until the match occurs |
| `overwrite_with` | `newest` | Whether to use the fields from the `oldest` or the `newest` entry for all the fields that are not combined. |
| `force_flush_period` | `5s` | Flush timeout after which entries will be flushed aborting the wait for their sub parts to be merged with. |
| `source_identifier` | `$attributes["file.path"]` | The [field](../types/field.md) to separate one source of logs from others when combining them. |
| `source_identifier` | | The [field](../types/field.md) to separate one source of logs from others when combining them. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is still attributes["file.path"] (though the $ should be removed).

Suggested change
| `source_identifier` | | The [field](../types/field.md) to separate one source of logs from others when combining them. |
| `source_identifier` | `attributes["file.path"]`. | The [field](../types/field.md) to separate one source of logs from others when combining them. |

However, the default appears to be out of date with the semantic convention, which is log.file.path, so we should change the default in another PR.

The "DefaultSourceIdentifier" is the default value used when the field is not present on a log entry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hummm, I see, @djaglowski. I think I got confused because of the warning in https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/operator/transformer/recombine/transformer.go#L119.

So the "entry" in this case means the log entry being processed, not the configuration of that recombine, right?

Isn't it weird that a log entry didn't have a log file path? 🤔

Copy link
Member Author

@douglascamata douglascamata Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oh, fixing this would be a breaking change, right? Is this change desired right now? I'm open to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the "entry" in this case means the log entry being processed, not the configuration of that recombine, right?

Exactly.

Oh oh, fixing this would be a breaking change, right? Is this change desired right now? I'm open to do it.

Yes, unfortunately, but I think it's one we ought to do. The intention was always that logs would be recombined by default according to the files they were read from. Unfortunately it seems we updated the attribute name in one place and not the other so it's likely not working as intended for many users. Arguably this is just a bug fix, but since it's been in place for so long I think we should mark it as breaking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djaglowski I updated the PR given the outcome of our conversation, including its title. Please have a look whenever you have some time, thank you!

@douglascamata douglascamata changed the title [docs][pkg/stanza] Fix docs for recombine operator [pkg/stanza] Adopt semantic convention for the log file path attribute Jan 16, 2025
@djaglowski
Copy link
Member

Thanks for taking this on @douglascamata

@douglascamata
Copy link
Member Author

@djaglowski my pleasure. Thank you very much for the review! 🙇

I handled your feedback in c90a15c, so please have a look again when you can.

@douglascamata
Copy link
Member Author

My rebase with main caught an issue in the CI introduced by fafdfdd. Waiting for a fix for that to rebase again and fix the build.

@douglascamata
Copy link
Member Author

The build is fixed, @djaglowski

@douglascamata
Copy link
Member Author

Hey @djaglowski, please have another look at this when you have some time. Thanks!

@djaglowski djaglowski merged commit 9b520cc into open-telemetry:main Feb 3, 2025
164 checks passed
@djaglowski
Copy link
Member

Thanks @douglascamata!

@github-actions github-actions bot added this to the next release milestone Feb 3, 2025
chengchuanpeng pushed a commit to chengchuanpeng/opentelemetry-collector-contrib that referenced this pull request Feb 8, 2025
open-telemetry#37210)

#### Description

This PR adopts [the semantic convention for the log file path
attribute](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/logs.md#log-file),
which should be `attributes["log.file.path"]`.

It fixes the default value for the `recombine` operator's
`source_identifier`.
zeck-ops pushed a commit to zeck-ops/opentelemetry-collector-contrib that referenced this pull request Apr 23, 2025
open-telemetry#37210)

#### Description

This PR adopts [the semantic convention for the log file path
attribute](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/logs.md#log-file),
which should be `attributes["log.file.path"]`.

It fixes the default value for the `recombine` operator's
`source_identifier`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants