Skip to content

OTel-Arrow Phase 2 #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jmacd opened this issue Feb 24, 2025 · 9 comments
Open

OTel-Arrow Phase 2 #294

jmacd opened this issue Feb 24, 2025 · 9 comments
Labels
pipeline Rust Pipeline Related Tasks rust Pull requests that update Rust code

Comments

@jmacd
Copy link
Contributor

jmacd commented Feb 24, 2025

Summary

The OTel-Arrow team is embarking on Phase 2 of the project.

The OTel-Arrow Protocol (OTAP) is a proposed enhancement to the OpenTelemetry protocol (OTLP). This was planned in two phases based on community feedback, in order to avoid diverting resources from the OpenTelemetry Collector project.

In Phase 1 of the project, we focused on a Golang implementation of the protocol, developed the Exporter/Receiver pair and contributed them to the Collector-Contrib repository. We put the new OTel-Arrow components into production and posted our results on the OpenTelemetry Blog.

Scope

Phase 2 of the project is focused on building an end-to-end OTAP pipeline. We choose to use the Rust language for this next step, because of the relative strength of the Apache Arrow ecosystem in this language. The following areas are included in scope for this phase of the project.

  • Implementation of a first-class Rust OTAP pipeline
  • Implementation of a first-class Rust OTAP SDK
  • Incorporation of DataFusion elements for telemetry processing
  • Usage of Arrow IPC and Parquet file formats for storing telemetry.
@jmacd jmacd pinned this issue Feb 24, 2025
This was referenced Feb 24, 2025
lquerel added a commit that referenced this issue Feb 24, 2025
Beaubourg is a Rust library for constructing OpenTelemetry-like
pipelines that was part of the initial OTel-Arrow prototype.

Part of #294.
jmacd added a commit that referenced this issue Feb 25, 2025
This is a general design document for the OTel-Arrow Phase-2 pipeline.

Part of #294.

---------

Co-authored-by: Drew Relmas <[email protected]>
Co-authored-by: Laurent Quérel <[email protected]>
@austinlparker
Copy link
Member

Hi! Given that this appears to be a change in scope for the project (looking through all of the history I could find on this project, I couldn't find anything about a Rust-based Collector), the GC would like to discuss this plan. It may need to go through the project approval process again. Thank you in advance for your understanding. If you would like to come to an upcoming GC meeting to present on this issue and discuss it with us, please let me know and we can get you on the agenda.

@jmacd
Copy link
Contributor Author

jmacd commented Feb 28, 2025

We've been careful to avoid describing a "Rust-based Collector" in favor of the term "pipeline" in https://github.com/open-telemetry/otel-arrow/blob/main/docs/phase2-design.md because our goal is to understand how an end-to-end implementation using OTAP performs. I don't think you got the phrase "Rust-based Collector" from our Phase-2 announcement, and the proposed blog post would stay clear of this as well.

When @lquerel originally made a demo for OTel-Arrow in 2022 with his Rust-based prototype (see #293), there was a similar concern over the appearance of a "Rust-based Collector". The two phases of the project were spelled out explicitly in the OTEP as a response this feedback at the time: https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0156-columnar-encoding.md#integration-strategy-and-phasing. In other words, the OTel-Arrow project agreed to implement Golang components as the first step, to hold the community together when we moved on to a Rust pipeline.

@austinlparker
Copy link
Member

Yes, specifically, https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0156-columnar-encoding.md#phase-2 calls out interoperability in the existing Collector layer. However, your issue here says --

Phase 2 of the project is focused on building an end-to-end OTAP pipeline. We choose to use the Rust language for this next step, because of the relative strength of the Apache Arrow ecosystem in this language. The following areas are included in scope for this phase of the project.

Your design document also clearly indicates that this Rust OTAP pipeline would offer duplicate functionality of the Collector (https://github.com/open-telemetry/otel-arrow/blob/7db821a6310fb0be27b84aea22abe05b14641f1b/docs/phase2-design.md)

A pipeline is required to support a flexible and configurable arrangement of components, and it is responsible for constructing the component graph at runtime. The OpenTelemetry Collector component model, including receiver, processor, exporter, connector, and extension components, will be followed. We expect to follow the Collector's configuration model for the pipeline, for example with service::pipelines and service::telemetry sections.

With the goal to create an end-to-end OTAP pipeline covering a range of standard pipeline behaviors, here are the set of core features we believe are needed. We follow telemetry pipeline terminology developed in the OpenTelemetry Collector.

etc.

@austinlparker
Copy link
Member

Thanks for your patience. We discussed this topic at the most recent GC meeting, and have a few specific questions.

  • What constraints are leading to the Rust requirement for this pipeline? Processing performance/memory safety? Arrow library support in Rust vs. Go? Etc.
  • If the answer is strictly due to performance, are the performance benefits of Rust language/runtime related or are they related to Collector architectural decisions (e.g., overhead incurred on intermediate translation between row-based pdata and column-based Arrow)
  • What is the vision for when end-users should use OTAP pipelines vs. Collectors (as they exist today)?
  • What is the interop story between Collectors and OTAP pipelines?
  • What is the relationship between OTAP and the proposed STEF format?

If you'd like to join us on a forthcoming GC call to discuss this, your presence would be welcomed! Thanks in advance for your answers.

@jmacd
Copy link
Contributor Author

jmacd commented Mar 12, 2025

What constraints are leading to the Rust requirement for this pipeline? Processing performance/memory safety? Arrow library support in Rust vs. Go? Etc.

"All of the above". The choice of Rust is related to the strength of the Arrow libraries in Rust, which follows from performance and memory safety in Rust, which matters especially in a pipeline carrying zero-copy data. Go has no strongly-enforced "immutable borrow", so the pipeline data used in the Go Collector is frequently copied; there is no way for components to share immutable data into a pipeline, and the pipeline cannot share references to internal values with reference semantics (e.g., you must iterate-over or copy a slice value, you cannot cheaply borrow the slice).

Arrow libraries are better in Rust because Rust is a better environment for data-intensive programming, for zero-copy safety reasons illustrated above. As a particularly appealing example, DataFusion is an extensible query engine framework written in Rust that has no equivalent in Go.

If the answer is strictly due to performance, are the performance benefits of Rust language/runtime related or are they related to Collector architectural decisions (e.g., overhead incurred on intermediate translation between row-based pdata and column-based Arrow)

Rust programs have greater control over memory management and latency as a result, and this is the same reason that more Arrow programming is done in Rust than in Go. We believe that garbage collection is a fundamental obstacle to adopting OpenTelemetry pipelines in embedded and critical security scenarios.

However, the answer is not strictly due to performance, it is about the Arrow-Rust community. To reinforce this point, the Arrow-Rust repository has approximately five times as many pull requests as the Arrow-Go repository (roughly 4162 in Rust, 898 in Go as of this writing). Arrow-Rust has 3.5x more contributors, compared with Arrow-Go; Arrow-Rust has 12 contributors with over 100 commits each, while Arrow-Go has 11 contributors with over 10 commits each. The object_store library, for accessing all the major cloud storage providers, is hosted within the Arrow-Rust community; there is no equivalent in Arrow-Go.

Disclaimer: Laurent and Josh are both Arrow-Go contributors. :-)

What is the vision for when end-users should use OTAP pipelines vs. Collectors (as they exist today)?

This question is hypothetical, in our opinion. Without performance measurements over an end-to-end OTAP pipeline, and without knowing the volume of telemetry involved, it is a difficult question to answer. The use of a column-oriented representation makes certain operations very inexpensive, such as renaming fields (e.g., as part of a schema translation), dropping fields (e.g., redaction), and scaling values (e.g., as part of a units conversion).

We think users should use an OTAP pipeline when they benefit from the Arrow intermediate representation to reduce costs associated with telemetry collection. We expect users with a high volume of telemetry to benefit from our approach; we anticipate substantially better compression rates, faster data processing, and lower resource consumption for these users.

We believe our Phase-1 result will repeat itself, which is to say that conversion from OTLP to OTAP and back again will cost less than the savings that await users from faster queries. However, in this phase of the project, we are aiming to unlock the full potential of OTAP, which is achieved when we avoid multiple conversions in the first place. We believe OTAP could eventually replace OTLP, in the long run, for bulk telemetry transport.

We believe that users with a strong requirement for safety and security under constrained resources will potentially choose a pure Rust pipeline. We think this sort of question should wait until we have Phase-2 results, until we have an idea of what can be done and the relative costs.

What is the interop story between Collectors and OTAP pipelines?

We see a potential to use OTAP pipelines as components in Golang Collector pipelines. The existing OTel-Arrow Golang reference implementation serves to convert between OTLP and OTAP representations already, and foreign function calls from Go to Rust will enable mixing these pipelines with low overhead thanks to the zero-copy nature of Arrow.

We see a potential to use Golang Collector components inside OTAP pipelines, too. If OTAP is indeed a more efficient transport representation, then more and more it will make sense to translate high-value components to use OTAP directly. It will be unavoidable that we compare today's implementation of a simple transform processor using OTTL expressions with a functionally-equivalent implementation based on DataFusion using the same OTTL expressions. We think it makes sense to proceed with our experiment, so that we have better answers to this and similar questions in the future.

What is the relationship between OTAP and the proposed STEF format?

These two protocols are related in the sense that they both use gRPC streaming and both aim to achieve high compression rates. STEF is a metrics-only protocol and apparently a vendor-specific one. STEF does not have a receiver in the Collector-Contrib repository. OTel-Arrow compnents have a commitment 100% compatibility with OpenTelemetry, this includes seamless support for combined OTLP/OTAP modes in both the exporter and the receiver.

STEF is exclusively focused on compression rates. The benefit of OTAP is the direct availability of Arrow data frames for use with Arrow-based tools, something STEF does not offer. We'd like to demonstrate performance improvements in Phase 2 of the project, having already demonstrated a substantial compression benefit in Phase 1.

@austinlparker
Copy link
Member

Thanks for your responses! I've invited the two of you to the next GC meeting where we'll discuss this topic.

@jpkrohling
Copy link
Member

@jmacd , I'm having difficulties understanding what is Go and what is Rust in the "OTel-Arrow Collector" image below:

Image

My initial impression is that this is a Rust Collector, and some of your comments seem to go in this direction ("more and more it will make sense to translate high-value components to use OTAP directly"), but at the same time, they can be read as "they'll be written in Go and handle OTAP instead of OTLP".

I have concerns about duplicating our Collector efforts and making things even more complex to our end-users, but I wanted to fully understand the proposal.

jmacd added a commit that referenced this issue Mar 25, 2025
otel-arrow-rust is a Rust implementation of otel-arrow protocol. This
implementation currently supports the univariate metrics signal. This
contribution is considered a useful starting point for a complete Rust
implementation of OTAP.

Part of #294
Part of open-telemetry/opentelemetry.io#6410

Originally authored at https://github.com/v0y4g3r/otel-arrow-rust

---------

Co-authored-by: Joshua MacDonald <[email protected]>
@jmacd
Copy link
Contributor Author

jmacd commented Mar 29, 2025

@jpkrohling We hope that https://github.com/open-telemetry/otel-arrow/blob/main/docs/project-phases.md will address your concerns. I feel that I owe you an updated diagram, since the one you copied above is from the OTEP.

@jpkrohling
Copy link
Member

Looks good, thank you for addressing that.

@lquerel lquerel added pipeline Rust Pipeline Related Tasks rust Pull requests that update Rust code labels Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pipeline Rust Pipeline Related Tasks rust Pull requests that update Rust code
Projects
Status: No status
Development

No branches or pull requests

4 participants