Skip to content

[AWS S3 Exporter] Add support for predictably ordered S3 keys #40515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dacort opened this issue Jun 5, 2025 · 3 comments
Open

[AWS S3 Exporter] Add support for predictably ordered S3 keys #40515

dacort opened this issue Jun 5, 2025 · 3 comments

Comments

@dacort
Copy link

dacort commented Jun 5, 2025

Component(s)

exporter/awss3

Is your feature request related to a problem? Please describe.

Today, the S3 Exporter uses a random uniqueKey function to prevent collisions on file uploads.

Unfortunately, this can make it hard to know in what order files were uploaded. When using the S3 Exporter to stream raw log files (like stderr/stdout), this can make it difficult to maintain the ordering of the log lines once those files have been uploaded. For systems that don't provided structured logging (Spark), it results in a poor user experience.

Describe the solution you'd like

I'd like to be able to configure a different UniqueKeyField as mentioned in the todo that maintains ordering on upload.

Specifically, either an incrementing integer or UUIDv7. The latter seems better from an implementation perspective. The former is slightly more human-friendly, but could result in files getting overwritten if the exporter is restarted.

Describe alternatives you've considered

Additional context

No response

@dacort dacort added enhancement New feature or request needs triage New item requiring triage labels Jun 5, 2025
Copy link
Contributor

github-actions bot commented Jun 5, 2025

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@VihasMakwana
Copy link
Contributor

Hello!

I agree that we should provide an option to configure the ordering.

Regarding the approaches you shared, I feel like we should stick with UUID7.

If we try to implement this with a sequential integer, we might lose the last known value between collector runs. For example, during the first run, we might save 100 files and the counter would be at 100. However, if the process is restarted, the counter could reset to 0 (unless we implement a mechanism to persist and restore the last known counter value) and it might point to same bucket.

@VihasMakwana VihasMakwana added waiting-for-code-owners and removed needs triage New item requiring triage labels Jun 6, 2025
@dacort
Copy link
Author

dacort commented Jun 6, 2025

nod Yep, my thoughts as well.

Happy to contribute the PR - I have an existing test of the UUIDv7 approach I can clean up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants