Skip to content

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Sep 22, 2025

What does this PR do?

Makes inputs use the process runtime if they're configured to use the otel runtime, and the latter cannot support them. Currently there are two possible reasons for this:

  • The output is not suppored at all - logstash is an example.
  • The output configuration uses unsupported config options - allow_older_versions: false for the elasticsearch output for example.

A log line is also emitted if this happens.

I've also moved the component monitoring code into its own package to avoid dependency cycles.

Why is it important?

We want to enable the otel runtime progressively without ever breaking a user's working setup. This means that a fallback is necessary, even if it may involve going against the explicit configuration. Later, we may add a flag which causes this to be an error instead.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

Build the agent package and run it locally with either an elasticsearch output using an unsupported option (this PR uses allow_older_versions: false), or a kafka or logstash output. Then look at the status and logs. You should see a log line warning that your input was switched to the process runtime.

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@swiatekm swiatekm added backport-8.19 Automated backport to the 8.19 branch backport-9.1 Automated backport to the 9.1 branch skip-changelog labels Sep 22, 2025
@swiatekm swiatekm force-pushed the feat/otel-mode-fallback-unsupported-output branch 2 times, most recently from b15429a to 38b5c72 Compare September 23, 2025 12:16
@swiatekm swiatekm marked this pull request as ready for review September 23, 2025 12:57
@swiatekm swiatekm requested a review from a team as a code owner September 23, 2025 12:57
Copy link
Contributor

mergify bot commented Sep 24, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feat/otel-mode-fallback-unsupported-output upstream/feat/otel-mode-fallback-unsupported-output
git merge upstream/main
git push upstream feat/otel-mode-fallback-unsupported-output

@swiatekm swiatekm requested a review from cmacknz September 24, 2025 10:32
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Sep 24, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@swiatekm swiatekm removed request for a team September 24, 2025 17:14
blakerouse
blakerouse previously approved these changes Sep 24, 2025
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good and well tested. I added 1 nit, but it is very much a nit. Feel free to ignore, if you like. ;-)

@cmacknz
Copy link
Member

cmacknz commented Sep 24, 2025

LGTM too, thanks for addressing my comments.

Agent monitoring has two separate functions - implementing the control
plane monitoring server and self-monitoring for components. Having both
in the same packages caused a dependency cycle involving the otel
translation package. Resolve this by putting component monitoring in a
subpackage
@swiatekm swiatekm force-pushed the feat/otel-mode-fallback-unsupported-output branch from 0249d22 to c712487 Compare September 25, 2025 11:09
@swiatekm swiatekm requested a review from blakerouse September 25, 2025 11:09
@swiatekm swiatekm enabled auto-merge (squash) September 25, 2025 11:54
@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 25, 2025

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

@swiatekm swiatekm merged commit 2f0ba69 into main Sep 26, 2025
22 of 23 checks passed
@swiatekm swiatekm deleted the feat/otel-mode-fallback-unsupported-output branch September 26, 2025 09:33
mergify bot pushed a commit that referenced this pull request Sep 26, 2025
* Move component monitoring to its own package

Agent monitoring has two separate functions - implementing the control
plane monitoring server and self-monitoring for components. Having both
in the same packages caused a dependency cycle involving the otel
translation package. Resolve this by putting component monitoring in a
subpackage

* Fall back to process runtime if otel runtime is unsupported

* Fix integration test

* Normalize import names

* Add logstash unit test

* Use indices instead of allow_older_version

* Add log line for skipped components

* Change argument order

* Fix linter warning

(cherry picked from commit 2f0ba69)

# Conflicts:
#	internal/pkg/agent/application/application.go
#	internal/pkg/agent/application/coordinator/coordinator.go
#	internal/pkg/otel/translate/otelconfig.go
#	internal/pkg/otel/translate/otelconfig_test.go
mergify bot pushed a commit that referenced this pull request Sep 26, 2025
* Move component monitoring to its own package

Agent monitoring has two separate functions - implementing the control
plane monitoring server and self-monitoring for components. Having both
in the same packages caused a dependency cycle involving the otel
translation package. Resolve this by putting component monitoring in a
subpackage

* Fall back to process runtime if otel runtime is unsupported

* Fix integration test

* Normalize import names

* Add logstash unit test

* Use indices instead of allow_older_version

* Add log line for skipped components

* Change argument order

* Fix linter warning

(cherry picked from commit 2f0ba69)

# Conflicts:
#	internal/pkg/agent/application/application.go
#	internal/pkg/agent/application/coordinator/coordinator.go
#	internal/pkg/agent/application/monitoring/process.go
#	internal/pkg/agent/cmd/inspect.go
#	internal/pkg/otel/manager/diagnostics.go
#	internal/pkg/otel/manager/diagnostics_test.go
#	internal/pkg/otel/translate/otelconfig.go
#	internal/pkg/otel/translate/otelconfig_test.go
#	testing/integration/ess/beat_receivers_test.go
@swiatekm swiatekm removed the backport-9.1 Automated backport to the 9.1 branch label Sep 26, 2025
swiatekm added a commit that referenced this pull request Sep 26, 2025
… is unsupported (#10166)

* Fall back to process runtime if otel runtime is unsupported (#10087)

* Move component monitoring to its own package

Agent monitoring has two separate functions - implementing the control
plane monitoring server and self-monitoring for components. Having both
in the same packages caused a dependency cycle involving the otel
translation package. Resolve this by putting component monitoring in a
subpackage

* Fall back to process runtime if otel runtime is unsupported

* Fix integration test

* Normalize import names

* Add logstash unit test

* Use indices instead of allow_older_version

* Add log line for skipped components

* Change argument order

* Fix linter warning

(cherry picked from commit 2f0ba69)

# Conflicts:
#	internal/pkg/agent/application/application.go
#	internal/pkg/agent/application/coordinator/coordinator.go
#	internal/pkg/otel/translate/otelconfig.go
#	internal/pkg/otel/translate/otelconfig_test.go

* Fix conflicts

---------

Co-authored-by: Mikołaj Świątek <[email protected]>
v1v added a commit that referenced this pull request Sep 26, 2025
* upstream: (505 commits)
  Update journald tests now that Filebeat supports watching folders (#10131)
  [deploy/kubernetes]: add info about hostPID for Universal Profiling (#10173)
  Fall back to process runtime if otel runtime is unsupported (#10087)
  Conditionall check for ms_tls13kdf build tag (#10160)
  [docs][edot] add entry for profiles (#10163)
  edot/docs: add support for profiles (#10146)
  Add Logstash exporter (#10137)
  Add back publish to serverless. (#10159)
  Improve Integration test documentation (#10155)
  Fix multiarch service image push from main to serverless (#10129)
  Forward migrate action to endpoint (#9801)
  Comment out check for ms_tls13kdf tag for FIPS-capable binaries (#10148)
  [otel] add receivers: apache, iis, mysql, postgresql, sqlserver v0.135.0 (#9344)
  Add k8sevents receiver in kube-stack (#10086)
  feat: emit system resource metrics for EDOT subprocess (#10003)
  [AutoOps] Configure OTel Exporter to Send Maximum-sized Batches (#10126)
  keep enrollment token when replacing data with signed (#10115)
  Revert "Publish `elastic-agent-service` container directly to serverless from main (#9583)" (#10127)
  Add agent_policy_id and policy_revision_idx to checkin requests (#9931)
  remove resource/k8s processor and use k8sattributes processor for service attributes (#10108)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.19 Automated backport to the 8.19 branch skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
5 participants