Skip to content

[otap-dataflow] parquet exporter flush open writers after timeout #499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
albertlockett opened this issue May 30, 2025 · 0 comments
Open
Labels
enhancement New feature or request parquet-exporter Parquet Exporter related tasks pipeline Rust Pipeline Related Tasks rust Pull requests that update Rust code

Comments

@albertlockett
Copy link
Member

albertlockett commented May 30, 2025

After what was implemented in #488, the parquet exporter only flushes files if:

  • a) the files surpass some number of rows
  • b) the exporter receives a shutdown message

If neither of these things ever happens, the open writer will never flush the file. This means that data may not become visible for quite a long time after it was received. We should probably have some capability to periodically flush writers in order to prevent this.

There are multiple ways to go about this timing and we can consider the best approach or even using a combination. For example, from the perspective of some unflushed file, we could flush on a timeout computed from either:

  • Time since first batch
  • Time since last batch

The batch processor is currently using time since last batch https://github.com/open-telemetry/otel-arrow/pull/347/files

This should be implemented after prerequisites:

@albertlockett albertlockett added pipeline Rust Pipeline Related Tasks parquet-exporter Parquet Exporter related tasks enhancement New feature or request rust Pull requests that update Rust code labels May 30, 2025
@albertlockett albertlockett changed the title [otap-dataflow] Parquet exporter timer control [otap-dataflow] Parquet exporter flush open writers after timeout May 30, 2025
@albertlockett albertlockett changed the title [otap-dataflow] Parquet exporter flush open writers after timeout [otap-dataflow] parquet exporter flush open writers after timeout May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request parquet-exporter Parquet Exporter related tasks pipeline Rust Pipeline Related Tasks rust Pull requests that update Rust code
Projects
None yet
Development

No branches or pull requests

1 participant