Skip to content

[k8sobjectsreceiver] Degrade gracefully, when watched CRDs are not found #38803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krisztianfekete opened this issue Mar 19, 2025 · 7 comments · Fixed by #38851
Closed

[k8sobjectsreceiver] Degrade gracefully, when watched CRDs are not found #38803

krisztianfekete opened this issue Mar 19, 2025 · 7 comments · Fixed by #38851

Comments

@krisztianfekete
Copy link
Contributor

krisztianfekete commented Mar 19, 2025

Component(s)

receiver/k8sobjects

What happened?

Description

Today, if you specify a list of CRDs to be watched, the collector won't start if any of them are missing and stay in CrashLoopBackOff.

Steps to Reproduce

Apply the following receiver and use it in a pipeline without having telemetries.telemetry.istio.io installed on your cluster.

receivers:
    k8sobjects:
      objects:
        - name: telemetries
          mode: watch
          group: telemetry.istio.io/v1alpha1

Expected Result

The collector should start, and the receiver should be able to watch all CRDs it has found.

Actual Result

Error: invalid configuration: receivers::k8sobjects: resource XY not found. Valid resources are: [ helmchartconfigs configmaps rolebindings etc. ... 

Collector version

v0.115.0

Environment information

Environment

N/A

OpenTelemetry Collector configuration

receivers:
  k8sobjects:
    objects:
      - name: telemetries
        mode: watch
        group: telemetry.istio.io/v1alpha1

Log output

Error: invalid configuration: receivers::k8sobjects: resource XY not found. Valid resources are: [helmchartconfigs configmaps rolebindings etc. ...

Additional context

No response

@krisztianfekete krisztianfekete added bug Something isn't working needs triage New item requiring triage labels Mar 19, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@krisztianfekete
Copy link
Contributor Author

krisztianfekete commented Mar 20, 2025

I can see two possible approaches to handle this

  • either demoting this to a warning, so the receiver can start successfully and watch available objects, or
  • doing the same, but by introducing an error_mode config option, similarly (or maybe even reusing ottl.ErrorMode directly) to other components in the project, so users would be able to choose between propagating the error and ignoring missing objects

What do you think?
cc. @dmitryax, @hvaghani221, @TylerHelmuth, @ChrsMark

@TylerHelmuth
Copy link
Member

I think a new config option is a good idea and will make sure that any changes to behavior are introduced in a non-breaking way. I think there are probably 2 modes to start:

  1. Fail to start of the configured object cannot be found (existing behavior)
  2. Ignore any objects that could not be found, logging a warning, and then start with whatever can be found.

I want to think about the situation in mode 2 where all objects cannot be found. Should the receiver still start?

@TylerHelmuth TylerHelmuth added enhancement New feature or request priority:p2 Medium and removed bug Something isn't working needs triage New item requiring triage labels Mar 21, 2025
@krisztianfekete
Copy link
Contributor Author

I want to think about the situation in mode 2 where all objects cannot be found. Should the receiver still start?

IMO it's better to start than being in a CrashLoopBackOff. I am fine making this depend on error_mode, and only fail when it's set to propagate.

@krisztianfekete
Copy link
Contributor Author

@TylerHelmuth, I was wondering if you could help us out here: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/38851/files#r2010626635

Today, the receiver's Validate function is checking whether any of the configured objects are missing, and if that's the case, it will fail to start.

My original approach was to introduce error_mode for validation, so users can opt-out of failing fast. @atoulme has brought up that this validation can also happen in the actual processing logic. What should be the recommended way to handle this?

@atoulme
Copy link
Contributor

atoulme commented Mar 24, 2025

I have filed open-telemetry/opentelemetry-collector#12715 to follow up on the behavior of the Validate function.

@krisztianfekete
Copy link
Contributor Author

/label waiting-for-code-owners

songy23 pushed a commit that referenced this issue Apr 25, 2025
…#38851)

#### Description
This PR adopts a similar logic to `ottl.ErrorMode` for
`k8sobjectsreceiver`, enabling users to choose between ignoring,
silencing, and propagating errors for missing objects.

The default is `propagate`, therefore it is backward compatible with the
current state.

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes
#38803

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Added tests, and also tested by building my own image and deployed into
a Kubernetes cluster.

<!--Describe the documentation added.-->
#### Documentation
I've updated `receiver/k8sobjectsreceiver/config.yaml` and README with
the new config options.

<!--Please delete paragraphs that you did not use before submitting.-->
vincentfree pushed a commit to ing-bank/opentelemetry-collector-contrib that referenced this issue May 6, 2025
…open-telemetry#38851)

#### Description
This PR adopts a similar logic to `ottl.ErrorMode` for
`k8sobjectsreceiver`, enabling users to choose between ignoring,
silencing, and propagating errors for missing objects.

The default is `propagate`, therefore it is backward compatible with the
current state.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes
open-telemetry#38803

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Added tests, and also tested by building my own image and deployed into
a Kubernetes cluster.

<!--Describe the documentation added.-->
#### Documentation
I've updated `receiver/k8sobjectsreceiver/config.yaml` and README with
the new config options.

<!--Please delete paragraphs that you did not use before submitting.-->
vincentfree pushed a commit to ing-bank/opentelemetry-collector-contrib that referenced this issue May 20, 2025
…open-telemetry#38851)

#### Description
This PR adopts a similar logic to `ottl.ErrorMode` for
`k8sobjectsreceiver`, enabling users to choose between ignoring,
silencing, and propagating errors for missing objects.

The default is `propagate`, therefore it is backward compatible with the
current state.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes
open-telemetry#38803

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Added tests, and also tested by building my own image and deployed into
a Kubernetes cluster.

<!--Describe the documentation added.-->
#### Documentation
I've updated `receiver/k8sobjectsreceiver/config.yaml` and README with
the new config options.

<!--Please delete paragraphs that you did not use before submitting.-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants