[otel-arrow-rust] Add method to `OtapBatch` to recursively decode delta/quasi-delta IDs/parent-ids #512

albertlockett · 2025-05-30T19:05:35Z

In these PRs, we added some helper functions decode delta encoded IDs and quasi-delta encoded parent IDs
#468
#455

It would be helpful if, rather than calling these helper functions adhoc for different record batches (like we do here:
https://github.com/open-telemetry/otel-arrow/pull/488/files#diff-b6fd2c1b9f84117da68d5f0b445ac8d16f8f813fbc5e69fe15069528dbea6f62

If we could just call this function once for the OtapBatch and it would take care of traversing the OtapBatch and removing the encoding from the root payload and all the child payloads. This should be idempotent in the sense that, if the ids have already been decoded it just returns the unmodified OtapBatch

The text was updated successfully, but these errors were encountered:

albertlockett · 2025-05-30T19:48:54Z

Once this is implemented, we can also go back and use this in the parquet-exported ID generation code

albertlockett · 2025-06-02T17:44:14Z

Starting work on this as it will be useful for #503

@lquerel

Basic implementation of Parquet Exporter. What's included in this PR: - Base parquet exporter - Generates IDs that are unique within some randomly generated partition - Basic functionality to identify partition from schema metadata - Writer module with: - Support for hive-style partitioning - Avoids flushing parent records before children --- This is still very much Work In Progress, but I wanted to get the PR up for early review before it got too long. Some things still TODO before marking this "Ready": - ~Handling various types of Control messages.~ - ~Need to review with @lquerel to better understand what is expected from the exporter implementation~ - After discussion w/ Laurent on 05/29, decided to open TODOs for many of these and link issues in comments - ~Various unit tests are not finished / implemented~ - ~Another pass of documentation for adding clarity + correcting any spelling/grammar errors~ - ~Fixing many clippy / formatting issues~ - ~Better resilience in computing the unique IDs. Currently this assumes the IDs are not delta encoded. It might wish to error if the IDs are not in plain encoding, and we might require an upstream processor to remove delta encoding.~ (#512 related helper that could simplify code in future) --- Plan for followups after this PR: - Immediate: - Add support for other signals types. Currently only logs is supported. Need to add support for metrics & traces. Currently there aren't any tests for traces / metrics, and the code in ID generation & writer needs to be adapted #503 - Near-term: - Better error handling (including retrying failed batches) #504 - Add support for remote object stores (S3, GCS, Azure) #501 - Add support for date-bucket partitioning (I had started implementing that in this PR, and took it out to simplify review). It might be best to just do this partitioning in a processor (and include benchmarks). I have some additional thoughts on this ~I'll document in a separate issue~ #496 (issue created) - Add options to control the parquet writer to the base config (e.g. controlling row group size) #502 - Future: - we could consider adding a `parquet` feature to the crate to avoid bringing in parquet & object_store dependencies unless this feature is enabled I'll document some of these TODOs for posterity in separate github issues. --- A few notes on the unique ID generation: We wanted to have a scheme where we are able to generate unique IDs for some given batch without any coordination between multiple nodes (nodes in this case being exporters on other threads, or other processes). The approach this PR takes is to keep have the exporter generate IDs that are unique within some partition, and write a partitioned parquet file using hive-style partitioning. Management of the partition is the sole responsibility of this instance of the exporter. It does this by randomly generating a partition ID, and then incrementing the IDs of each subsequent batch by the maximum ID of the previous batch. When the IDs overflow, it starts a new partition. There are some trade-offs to this approach: - replaying the data means that the data will get written twice (it's not idempotent) - the reader needs to be aware of this partitioning scheme I'm sure there are other better ID Generation schemes we could explore in the future, but seemed like a simple way to get something basic working. --- part of #399 --------- Co-authored-by: Joshua MacDonald <[email protected]>

…Ds & parent IDs (#528) Completes #512 - Adds a method to `OtapBatch` called `decode_transport_optimized_ids` which materializes the delta encoded/quasi-delta encoded IDs & parent IDs into their actual values. - Adds two new methods to materialize parent IDs for other record batch types: `materialize_parent_ids_for_columns` and `materialize_parent_ids_for_exemplars` - Moves the `materialize_parent_ids` function from `otlp::decoder` to `otap::transform` (the same location as all the other helper methods for transforming OTAP batches) and renames it to `materialize_parent_ids_for_attributes` to distinguish it from other methods that also materialize parent IDs - Adds checks in all the transform helper methods to make the invocation a noop if the column is already decoded - Cleans up TODOs in `parquet_exporter::idgen` by invoking the new `decode_transport_optimized_ids` method This is a prerequisite to continuing work on Parquet Exporter to support additional signal types #503

albertlockett added the rust Pull requests that update Rust code label May 30, 2025

albertlockett mentioned this issue May 30, 2025

feat: basic parquet exporter #488

Merged

albertlockett self-assigned this Jun 2, 2025

albertlockett added this to OTel-Arrow Jun 2, 2025

albertlockett moved this to In Progress in OTel-Arrow Jun 2, 2025

albertlockett mentioned this issue Jun 3, 2025

feat: add OtapBatch function for decoding all transport-optimized IDs & parent IDs #528

Merged

albertlockett moved this from In Progress to Done in OTel-Arrow Jun 5, 2025

albertlockett closed this as completed by moving to Done in OTel-Arrow Jun 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[otel-arrow-rust] Add method to `OtapBatch` to recursively decode delta/quasi-delta IDs/parent-ids #512

[otel-arrow-rust] Add method to `OtapBatch` to recursively decode delta/quasi-delta IDs/parent-ids #512

albertlockett commented May 30, 2025

albertlockett commented May 30, 2025

Uh oh!

albertlockett commented Jun 2, 2025

Uh oh!

[otel-arrow-rust] Add method to OtapBatch to recursively decode delta/quasi-delta IDs/parent-ids #512

[otel-arrow-rust] Add method to OtapBatch to recursively decode delta/quasi-delta IDs/parent-ids #512

Comments

albertlockett commented May 30, 2025

albertlockett commented May 30, 2025

Uh oh!

albertlockett commented Jun 2, 2025

Uh oh!

[otel-arrow-rust] Add method to `OtapBatch` to recursively decode delta/quasi-delta IDs/parent-ids #512

[otel-arrow-rust] Add method to `OtapBatch` to recursively decode delta/quasi-delta IDs/parent-ids #512