-
Notifications
You must be signed in to change notification settings - Fork 50
Add state/structure to OTAP batch + sorting/delta decoding transforms #468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add state/structure to OTAP batch + sorting/delta decoding transforms #468
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #468 +/- ##
==========================================
+ Coverage 62.27% 62.91% +0.63%
==========================================
Files 190 193 +3
Lines 27291 27696 +405
==========================================
+ Hits 16995 17424 +429
+ Misses 9760 9737 -23
+ Partials 536 535 -1
🚀 New features to boost your workflow:
|
@jmacd please take a look when you get a chance? This is some followup from last week's SIG meeting |
Link to thread about answering some of these questions: https://cloud-native.slack.com/archives/C08RRSJR7FD/p1747847973678329 |
487e6ff
to
15c033c
Compare
15c033c
to
1709fa5
Compare
Update 05/22 -- added some structure around OtapBatch and refactored decoder to use this. Started adding helper methods to update schema metadata when transforming record batches |
/// keys for arrow schema/field metadata | ||
pub mod metadata { | ||
/// schema metadata for which columns the record batch is sorted by | ||
pub const SORT_COLUMNS: &str = "sort_columns"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are multiple columns supported? I've seen this set to a single column, so far. Would it make sense to have SORT_COL0, SORT_COL1, ... to indicate the successive sort orders?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking maybe we could have some delimiter. E.g. I'm imagining how we'd represent the default sorting of attributes from the golang exporter, it would be type,key,value
. Do you think that'd work?
Update 05/23 - after discussion w/ @jmacd , decided it's probably best to merge only what we are pretty sure we'll need in this PR, which does not include the AttributeStoreV2 or the benchmark for it. I'm going to stash that code on a branch called |
This PR doesn't need to include AttributeStoreV2. It's still unclear whether or not we need Attribute Store at all to optimize encoding OTAP -> OTLP bytes. Rather than include code we don't need, we'll delete this code for now.
Part of #449
This PR adds:
OtapBatch
that encapsulates all the record batches associated with aBatchArrowRecord
. It provides efficient methods for setting & getting the record batches by payload type. This has been integrated into the decoder code.otap::transform
module, methods for sorting record batch by parent ID and removing delta encoding.The scope of this PR has somewhat changed. I'll leave the original description below, for posterity.
Original Description:
Restores some of the code from #447 , but with a slightly different implementation of the "Version 2" attribute store.
sort_by_parent_id
).attribute_by_delta_id
method (similar to what we did in [WIP] AttributesStore optimization and/or OTAP pdata research #447).attribute_by_delta_id
to be called with the delta ID of each parent ID.The benchmark results are interesting.. The v2 store gets slightly worse performance for small batch sizes (128), but gets much better performance for larger batch sizes (8092), which is encouraging!
Still very much WIP. One thing that needs to be sorted is where/when we apply this transform materialize parent IDs and sort the record batch, and how we track the state of the batch. This PR is currently doing it
decode::decoder::Consumer::consume_bar
, which may not be the right place.We might consider not merging all the code in this PR straight away, as some of this work is exploratory/proof of concept