-
Notifications
You must be signed in to change notification settings - Fork 39
OTel-Arrow Phase 2 #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Beaubourg is a Rust library for constructing OpenTelemetry-like pipelines that was part of the initial OTel-Arrow prototype. Part of #294.
This is a general design document for the OTel-Arrow Phase-2 pipeline. Part of #294. --------- Co-authored-by: Drew Relmas <[email protected]> Co-authored-by: Laurent Quérel <[email protected]>
Hi! Given that this appears to be a change in scope for the project (looking through all of the history I could find on this project, I couldn't find anything about a Rust-based Collector), the GC would like to discuss this plan. It may need to go through the project approval process again. Thank you in advance for your understanding. If you would like to come to an upcoming GC meeting to present on this issue and discuss it with us, please let me know and we can get you on the agenda. |
We've been careful to avoid describing a "Rust-based Collector" in favor of the term "pipeline" in https://github.com/open-telemetry/otel-arrow/blob/main/docs/phase2-design.md because our goal is to understand how an end-to-end implementation using OTAP performs. I don't think you got the phrase "Rust-based Collector" from our Phase-2 announcement, and the proposed blog post would stay clear of this as well. When @lquerel originally made a demo for OTel-Arrow in 2022 with his Rust-based prototype (see #293), there was a similar concern over the appearance of a "Rust-based Collector". The two phases of the project were spelled out explicitly in the OTEP as a response this feedback at the time: https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0156-columnar-encoding.md#integration-strategy-and-phasing. In other words, the OTel-Arrow project agreed to implement Golang components as the first step, to hold the community together when we moved on to a Rust pipeline. |
Yes, specifically, https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0156-columnar-encoding.md#phase-2 calls out interoperability in the existing Collector layer. However, your issue here says --
Your design document also clearly indicates that this Rust OTAP pipeline would offer duplicate functionality of the Collector (https://github.com/open-telemetry/otel-arrow/blob/7db821a6310fb0be27b84aea22abe05b14641f1b/docs/phase2-design.md)
etc. |
Thanks for your patience. We discussed this topic at the most recent GC meeting, and have a few specific questions.
If you'd like to join us on a forthcoming GC call to discuss this, your presence would be welcomed! Thanks in advance for your answers. |
"All of the above". The choice of Rust is related to the strength of the Arrow libraries in Rust, which follows from performance and memory safety in Rust, which matters especially in a pipeline carrying zero-copy data. Go has no strongly-enforced "immutable borrow", so the pipeline data used in the Go Collector is frequently copied; there is no way for components to share immutable data into a pipeline, and the pipeline cannot share references to internal values with reference semantics (e.g., you must iterate-over or copy a slice value, you cannot cheaply borrow the slice). Arrow libraries are better in Rust because Rust is a better environment for data-intensive programming, for zero-copy safety reasons illustrated above. As a particularly appealing example, DataFusion is an extensible query engine framework written in Rust that has no equivalent in Go.
Rust programs have greater control over memory management and latency as a result, and this is the same reason that more Arrow programming is done in Rust than in Go. We believe that garbage collection is a fundamental obstacle to adopting OpenTelemetry pipelines in embedded and critical security scenarios. However, the answer is not strictly due to performance, it is about the Arrow-Rust community. To reinforce this point, the Arrow-Rust repository has approximately five times as many pull requests as the Arrow-Go repository (roughly 4162 in Rust, 898 in Go as of this writing). Arrow-Rust has 3.5x more contributors, compared with Arrow-Go; Arrow-Rust has 12 contributors with over 100 commits each, while Arrow-Go has 11 contributors with over 10 commits each. The object_store library, for accessing all the major cloud storage providers, is hosted within the Arrow-Rust community; there is no equivalent in Arrow-Go. Disclaimer: Laurent and Josh are both Arrow-Go contributors. :-)
This question is hypothetical, in our opinion. Without performance measurements over an end-to-end OTAP pipeline, and without knowing the volume of telemetry involved, it is a difficult question to answer. The use of a column-oriented representation makes certain operations very inexpensive, such as renaming fields (e.g., as part of a schema translation), dropping fields (e.g., redaction), and scaling values (e.g., as part of a units conversion). We think users should use an OTAP pipeline when they benefit from the Arrow intermediate representation to reduce costs associated with telemetry collection. We expect users with a high volume of telemetry to benefit from our approach; we anticipate substantially better compression rates, faster data processing, and lower resource consumption for these users. We believe our Phase-1 result will repeat itself, which is to say that conversion from OTLP to OTAP and back again will cost less than the savings that await users from faster queries. However, in this phase of the project, we are aiming to unlock the full potential of OTAP, which is achieved when we avoid multiple conversions in the first place. We believe OTAP could eventually replace OTLP, in the long run, for bulk telemetry transport. We believe that users with a strong requirement for safety and security under constrained resources will potentially choose a pure Rust pipeline. We think this sort of question should wait until we have Phase-2 results, until we have an idea of what can be done and the relative costs.
We see a potential to use OTAP pipelines as components in Golang Collector pipelines. The existing OTel-Arrow Golang reference implementation serves to convert between OTLP and OTAP representations already, and foreign function calls from Go to Rust will enable mixing these pipelines with low overhead thanks to the zero-copy nature of Arrow. We see a potential to use Golang Collector components inside OTAP pipelines, too. If OTAP is indeed a more efficient transport representation, then more and more it will make sense to translate high-value components to use OTAP directly. It will be unavoidable that we compare today's implementation of a simple transform processor using OTTL expressions with a functionally-equivalent implementation based on DataFusion using the same OTTL expressions. We think it makes sense to proceed with our experiment, so that we have better answers to this and similar questions in the future.
These two protocols are related in the sense that they both use gRPC streaming and both aim to achieve high compression rates. STEF is a metrics-only protocol and apparently a vendor-specific one. STEF does not have a receiver in the Collector-Contrib repository. OTel-Arrow compnents have a commitment 100% compatibility with OpenTelemetry, this includes seamless support for combined OTLP/OTAP modes in both the exporter and the receiver. STEF is exclusively focused on compression rates. The benefit of OTAP is the direct availability of Arrow data frames for use with Arrow-based tools, something STEF does not offer. We'd like to demonstrate performance improvements in Phase 2 of the project, having already demonstrated a substantial compression benefit in Phase 1. |
Thanks for your responses! I've invited the two of you to the next GC meeting where we'll discuss this topic. |
@jmacd , I'm having difficulties understanding what is Go and what is Rust in the "OTel-Arrow Collector" image below: My initial impression is that this is a Rust Collector, and some of your comments seem to go in this direction ("more and more it will make sense to translate high-value components to use OTAP directly"), but at the same time, they can be read as "they'll be written in Go and handle OTAP instead of OTLP". I have concerns about duplicating our Collector efforts and making things even more complex to our end-users, but I wanted to fully understand the proposal. |
otel-arrow-rust is a Rust implementation of otel-arrow protocol. This implementation currently supports the univariate metrics signal. This contribution is considered a useful starting point for a complete Rust implementation of OTAP. Part of #294 Part of open-telemetry/opentelemetry.io#6410 Originally authored at https://github.com/v0y4g3r/otel-arrow-rust --------- Co-authored-by: Joshua MacDonald <[email protected]>
@jpkrohling We hope that https://github.com/open-telemetry/otel-arrow/blob/main/docs/project-phases.md will address your concerns. I feel that I owe you an updated diagram, since the one you copied above is from the OTEP. |
Looks good, thank you for addressing that. |
Uh oh!
There was an error while loading. Please reload this page.
Summary
The OTel-Arrow team is embarking on Phase 2 of the project.
The OTel-Arrow Protocol (OTAP) is a proposed enhancement to the OpenTelemetry protocol (OTLP). This was planned in two phases based on community feedback, in order to avoid diverting resources from the OpenTelemetry Collector project.
In Phase 1 of the project, we focused on a Golang implementation of the protocol, developed the Exporter/Receiver pair and contributed them to the Collector-Contrib repository. We put the new OTel-Arrow components into production and posted our results on the OpenTelemetry Blog.
Scope
Phase 2 of the project is focused on building an end-to-end OTAP pipeline. We choose to use the Rust language for this next step, because of the relative strength of the Apache Arrow ecosystem in this language. The following areas are included in scope for this phase of the project.
The text was updated successfully, but these errors were encountered: