You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #473 we added basic functionality for adaptive array builder but currently when upgrading the dictionary, we copy all the values.
Ideally we'd just be able to get access to the underlying key builder of the dictionary builder, finish/cast it to the new index type, and then create a new dictionary builder with the same values/internal state + the new key builder. As far as I know there's not currently a way to do this in arrow-rs.
TODO
confirm with arrow-rs community if this is possible
open any necessary issues in arrow-rs for this capability
implement changes in arrow-rs
use optimized implementation of dictionary builder conversion in dictionary upgrade.
The text was updated successfully, but these errors were encountered:
)
Part of: #533
Very rough implementation of adaptive array builders. This my "rust"
version of the builder's we've implemented in golang here:
https://github.com/open-telemetry/otel-arrow/blob/main/go/pkg/otel/common/schema/builder/record.go
The idea behind these is that when we're encoding OTAP records, we often
want to dynamically create columns in some record batch that that either
aren't added to the record batch (if all the values are null), or are
dictionary encoded with the smallest possible index, or are the native
array if the dictionary index would overflow. (Some of this was alluded
to in yesterday's SIG meeting).
The intended usage is something like this:
```rs
use otel_arrow_rust::encode::record::array::StringArrayBuilder;
let mut str_builder = StringArrayBuilder::new(ArrayOptions {
nullable: true,
dictionary_options: Some(DictionaryOptions {
min_cardinality: u8::MAX.into(),
max_cardinality: u16::MAX,
}),
});
// maybe append some values
str_builder.append_value(&"a".to_string());
let result = str_builder.finish();
let mut fields = Vec::new();
let mut columns = Vec::new();
if let Some(result) = result {
fields.push(Field:new("str", result.data_type, true));
columns.push(result.array);
}
let record_batch = RecordBatch::try_new(
Arc::new(Schema::new(fields)),
columns
)
.expect("should work");
```
Followup work includes:
- null support #534
- additional datatype support:
#535
- optimize the conversion between Dict<u8> -> Dict<u16>
#536
---------
Co-authored-by: Laurent Quérel <[email protected]>
Co-authored-by: Laurent Quérel <[email protected]>
In #473 we added basic functionality for adaptive array builder but currently when upgrading the dictionary, we copy all the values.
Ideally we'd just be able to get access to the underlying key builder of the dictionary builder, finish/cast it to the new index type, and then create a new dictionary builder with the same values/internal state + the new key builder. As far as I know there's not currently a way to do this in arrow-rs.
TODO
The text was updated successfully, but these errors were encountered: