Skip to content

[otel-arrow-rust] Adaptive array builders for encoding #533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
albertlockett opened this issue Jun 4, 2025 · 0 comments
Open

[otel-arrow-rust] Adaptive array builders for encoding #533

albertlockett opened this issue Jun 4, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request rust Pull requests that update Rust code

Comments

@albertlockett
Copy link
Member

When encoding OTAP batches, we often need to dynamically switch between the array not being present in the record batch, or the array being present but with either arrow type dictionary (with variable key size depending on the cardinality), or the native array type.

For more information see here: https://arrow.apache.org/blog/2023/06/26/our-journey-at-f5-with-apache-arrow-part-2/

We already have this implemented in Go:
https://github.com/open-telemetry/otel-arrow/blob/main/go/pkg/otel/common/schema/builder/record.go

We should add the same functionality to otel-arrow-rust

@albertlockett albertlockett self-assigned this Jun 4, 2025
@albertlockett albertlockett added enhancement New feature or request rust Pull requests that update Rust code labels Jun 4, 2025
@albertlockett albertlockett moved this to In Progress in OTel-Arrow Jun 4, 2025
@albertlockett albertlockett changed the title Adaptive array builders for encoding [otel-arrow-rust] Adaptive array builders for encoding Jun 4, 2025
github-merge-queue bot pushed a commit that referenced this issue Jun 5, 2025
)

Part of: #533

Very rough implementation of adaptive array builders. This my "rust"
version of the builder's we've implemented in golang here:
https://github.com/open-telemetry/otel-arrow/blob/main/go/pkg/otel/common/schema/builder/record.go

The idea behind these is that when we're encoding OTAP records, we often
want to dynamically create columns in some record batch that that either
aren't added to the record batch (if all the values are null), or are
dictionary encoded with the smallest possible index, or are the native
array if the dictionary index would overflow. (Some of this was alluded
to in yesterday's SIG meeting).

The intended usage is something like this:
```rs
use otel_arrow_rust::encode::record::array::StringArrayBuilder;

let mut str_builder = StringArrayBuilder::new(ArrayOptions {
    nullable: true,
    dictionary_options: Some(DictionaryOptions {
        min_cardinality: u8::MAX.into(),
        max_cardinality: u16::MAX,
    }),
});

// maybe append some values
str_builder.append_value(&"a".to_string());

let result = str_builder.finish();

let mut fields = Vec::new();
let mut columns = Vec::new();

if let Some(result) = result {
  fields.push(Field:new("str", result.data_type, true));
  columns.push(result.array);
}

let record_batch = RecordBatch::try_new(
    Arc::new(Schema::new(fields)),
    columns
)
.expect("should work");
```

Followup work includes:
- null support #534
- additional datatype support:
#535
- optimize the conversion between Dict<u8> -> Dict<u16>
#536

---------

Co-authored-by: Laurent Quérel <[email protected]>
Co-authored-by: Laurent Quérel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request rust Pull requests that update Rust code
Projects
Status: In Progress
Development

No branches or pull requests

1 participant