Skip to content

Commit 5cf9dbb

Browse files
jvoravongjinja2
andauthored
Use Helm generated instead of cert-manager generated certs for the operator (#1648)
* initial draft * Update CI/CD to support self signed cert data * update functional tests * more test fixes * updates to keep support for the certmanager subchart around, updated the functional aks tests to use certmanager for testing coverage * Update docs/auto-instrumentation-install.md Co-authored-by: Jina Jain <[email protected]> * draft migration guide for 0.118.0 to 0.119.0 * Update docs/auto-instrumentation-install.md Co-authored-by: Jina Jain <[email protected]> * Update docs after main merge * split our pre-commit update into a separate PR * Documentation improvements, mostly just reorganize content for easier reading * name fix * remove doc TODOs * More upgrading step touch ups * remove functional test values file updates because they are not needed * regenerate functional_tests/testdata/expected_kind_values/expected_cluster_receiver.yaml for latest changes brought in from main * dummy commit to get CI/CD run with the "Ignore Tests" PR label * restore comment that wasn't ment to be removed * doc update for autoGenerateCert.enabled * Remove missed cert-manager references in docs * Update UPGRADING.md Co-authored-by: Jina Jain <[email protected]> --------- Co-authored-by: Jina Jain <[email protected]>
1 parent 16dae9c commit 5cf9dbb

32 files changed

+879
-7534
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
2+
change_type: breaking
3+
# The name of the component, or a single word describing the area of concern, (e.g. agent, clusterReceiver, gateway, operator, chart, other)
4+
component: operator
5+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
6+
note: Migrate the operator to use Helm generated TLS certificates instead of cert-manager by default
7+
# One or more tracking issues related to the change
8+
issues: [1648]
9+
# (Optional) One or more lines of additional information to render under the primary note.
10+
# These lines will be padded with 2 spaces and then inserted directly into the document.
11+
# Use pipe (|) for multiline entries.
12+
subtext: |
13+
- Previously, certificates were generated by cert-manager by default; now they are generated by Helm templates unless configured otherwise.
14+
- This change simplifies the setup for new users while still supporting those who prefer using cert-manager or other solutions. For more details, see the [related documentation](https://github.com/signalfx/splunk-otel-collector-chart/tree/main/docs/auto-instrumentation-install.md#tls-certificate-requirement-for-kubernetes-operator-webhooks).
15+
- If you use `.Values.operator.enabled=true` and `.Values.certmanager.enabled=true`, please review the [upgrade guidelines](https://github.com/signalfx/splunk-otel-collector-chart/blob/main/UPGRADING.md#0119-to-0120).

.github/workflows/functional_test_v2.yaml

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -67,9 +67,6 @@ jobs:
6767
- name: Update dependencies
6868
run: |
6969
make dep-update
70-
- name: Deploy cert-manager
71-
run: |
72-
make cert-manager
7370
- name: run functional tests
7471
id: run-functional-tests
7572
env:
@@ -128,9 +125,6 @@ jobs:
128125
- name: Update dependencies
129126
run: |
130127
make dep-update
131-
- name: Deploy cert-manager
132-
run: |
133-
make cert-manager
134128
- name: run functional tests
135129
env:
136130
HOST_ENDPOINT: 0.0.0.0
@@ -183,19 +177,13 @@ jobs:
183177
- name: Update dependencies
184178
run: |
185179
cd base && make dep-update
186-
- name: Deploy cert-manager
187-
run: |
188-
cd base && make cert-manager
189180
- name: Deploy previous version of the chart
190181
run: |
191182
helm list | grep -q "^sock$" && echo "Found previous 'sock' release. Deleting..." && helm delete sock
192183
cd base && helm install sock helm-charts/splunk-otel-collector --set cloudProvider=aws --set distribution=eks --set splunkObservability.realm=us0 --set splunkObservability.accessToken=xxxxx
193184
- name: Update dependencies
194185
run: |
195186
make dep-update
196-
- name: Deploy cert-manager
197-
run: |
198-
make cert-manager
199187
- name: run functional tests
200188
env:
201189
HOST_ENDPOINT: 0.0.0.0
@@ -238,19 +226,13 @@ jobs:
238226
- name: Update dependencies
239227
run: |
240228
cd base && make dep-update
241-
- name: Deploy cert-manager
242-
run: |
243-
cd base && make cert-manager
244229
- name: Deploy previous version of the chart
245230
run: |
246231
helm list | grep -q "^sock$" && echo "Found previous 'sock' release. Deleting..." && helm delete sock
247232
cd base && helm install sock helm-charts/splunk-otel-collector --set cloudProvider=aws --set distribution=eks --set splunkObservability.realm=us0 --set splunkObservability.accessToken=xxxxx --set operator.enabled=true --set environment=dev
248233
- name: Update dependencies
249234
run: |
250235
make dep-update
251-
- name: Deploy cert-manager
252-
run: |
253-
make cert-manager
254236
- name: run functional tests
255237
env:
256238
HOST_ENDPOINT: 0.0.0.0

UPGRADING.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,103 @@
11
# Upgrade guidelines
22

3+
## 0.119.0 to 0.120.0
4+
5+
This guide provides steps for new users, transitioning users, and those maintaining previously deployed Operator-related TLS certificates and configurations.
6+
7+
- New users: No migration is required for Operator TLS certificates.
8+
- Previous users: Migration may be needed if using `operator.enabled=true` or `certmanager.enabled=true`.
9+
10+
To maintain previous functionality and avoid breaking changes, review the following sections.
11+
12+
### **Maintaining Previous Functionality via Helm Values Update**
13+
14+
#### **Scenario 1: Operator and cert-manager Deployed via This Helm Chart**
15+
16+
If you previously deployed both the Operator and cert-manager via this Helm chart (`operator.enabled=true` and `certmanager.enabled=true`), you can preserve functionality by adding the following values:
17+
18+
```yaml
19+
operator:
20+
enabled: true
21+
admissionWebhooks:
22+
certManager:
23+
enabled: true
24+
certificateAnnotations:
25+
"helm.sh/hook": post-install,post-upgrade
26+
"helm.sh/hook-weight": "1"
27+
issuerAnnotations:
28+
"helm.sh/hook": post-install,post-upgrade
29+
"helm.sh/hook-weight": "1"
30+
certmanager:
31+
enabled: true
32+
installCRDs: true
33+
```
34+
35+
#### **Scenario 2: Operator Deployed with External cert-manager (Not Managed by This Helm Chart)**
36+
37+
If you previously deployed the Operator and used an externally managed cert-manager (`operator.enabled=true` and `certmanager.enabled=false`), you can preserve functionality by adding the following values:
38+
39+
```yaml
40+
operator:
41+
enabled: true
42+
admissionWebhooks:
43+
certManager:
44+
enabled: true
45+
```
46+
47+
### **Adopting New Functionality (Requires Migration Steps)**
48+
49+
If you want to migrate from cert-manager managed certificates to the now default Helm-generated certificates, additional steps may be required to avoid conflicts.
50+
51+
#### **Potential Upgrade Issue: Existing Secret Conflict**
52+
53+
If you see an error message like the following during a Helm install or upgrade:
54+
55+
```
56+
warning: Upgrade "{helm_release_name}" failed: pre-upgrade hooks failed: warning: Hook pre-upgrade splunk-otel-collector/charts/operator/templates/admission-webhooks/operator-webhook.yaml failed: 1 error occurred:* secrets "splunk-otel-collector-operator-controller-manager-service-cert" already exists
57+
```
58+
59+
This typically occurs because:
60+
- cert-manager deletes its `Certificate` resources immediately.
61+
- However, cert-manager does not delete the associated **secrets** instantly. It waits for its garbage collector process to remove them.
62+
63+
You will first have to delete this chart, wait for cert-manager to do garbage collection, and then install the latest version of this chart.
64+
With the assumption your Helm release is named "splunk-otel-collector", we show the commands to run below.
65+
- `Be aware these steps likely include the operator being unavailable and having down time for this service in your environment.`
66+
67+
#### **Step 1: Delete this Helm Chart**
68+
69+
Use a command like this to delete the chart in your namespace:
70+
71+
```bash
72+
helm delete splunk-otel-collector --namespace <your_namespace>
73+
```
74+
75+
#### **Step 2: Verify If the Old Cert Manager Secret Does Not Exists Anymore**
76+
77+
Use the following command to check if the certificate secret remains in your namespace:
78+
79+
```bash
80+
kubectl get secret splunk-otel-collector-operator-controller-manager-service-cert --namespace <your_namespace>
81+
```
82+
83+
#### **Step 3: Wait for Secret Removal or Manually Delete It**
84+
85+
If the secret still exists, you must wait for cert-manager to remove it or delete it manually:
86+
87+
```bash
88+
kubectl delete secret splunk-otel-collector-operator-controller-manager-service-cert --namespace <your_namespace>
89+
```
90+
91+
#### **Step 4: Proceed with Helm Install**
92+
93+
Once the secret is no longer present, you can install the chart with the latest version (`0.120.0`) successfully:
94+
95+
```bash
96+
helm install splunk-otel-collector splunk-otel-collector-chart/splunk-otel-collector --values ~/values.yaml --namespace <your_namespace>
97+
```
98+
#### **Step 5 (Optional): Delete cert-manager CRDs**
99+
100+
Helm delete will not remove CRDs objects created as part of the cert-manager installation. You can find the command to delete cert-manager CRDs in their official documentation [here](https://cert-manager.io/docs/installation/helm/#uninstalling-with-helm).
3101
## 0.113.0 to 0.116.0
4102

5103
This guide provides steps for new users, transitioning users, and those maintaining previous operator CRD configurations:

docs/auto-instrumentation-install.md

Lines changed: 93 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,6 @@ these frameworks often have pre-built instrumentation capabilities already avail
6868
- [partially enable profiling](../examples/enable-operator-and-auto-instrumentation/instrumentation/instrumentation-enable-profiling-partially.yaml).
6969

7070
```bash
71-
# Check if cert-manager is already installed, don't deploy a second cert-manager.
72-
kubectl get pods -l app=cert-manager --all-namespaces
73-
74-
# If cert-manager is not deployed, make sure to add certmanager.enabled=true to the list of values to set
7571
helm install splunk-otel-collector -f ./my_values.yaml --set operatorcrds.install=true,operator.enabled=true,environment=dev splunk-otel-collector-chart/splunk-otel-collector
7672
```
7773

@@ -462,81 +458,124 @@ helm template splunk-otel-collector-chart/splunk-otel-collector --include-crds \
462458
| kubectl delete --dry-run=client -f -
463459
```
464460

465-
### Documentation Resources
461+
### TLS Certificate Requirement for Kubernetes Operator Webhooks
466462

467-
- https://developers.redhat.com/devnation/tech-talks/using-opentelemetry-on-kubernetes
468-
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md
469-
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#instrumentation
470-
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md#opentelemetry-auto-instrumentation-injection
471-
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md#use-customized-or-vendor-instrumentation
463+
In Kubernetes, the API server communicates with operator webhook components over HTTPS, which requires a valid TLS certificate that the API server trusts. The operator supports several methods for configuring the required certificate, each with different levels of complexity and security.
472464

473-
### Troubleshooting the Operator and Cert Manager
465+
---
474466

475-
#### Check the logs for failures
467+
#### 1. **Using a Self-Signed Certificate Generated by the Chart**
476468

477-
**Operator Logs:**
469+
This is the default and simplest method for generating a TLS certificate. It automatically creates a self-signed certificate for the webhook, making it suitable for internal environments or testing purposes. However, it may not be trusted by clients outside your cluster.
478470

479-
```bash
480-
kubectl logs -l app.kubernetes.io/name=operator
471+
**Note**: The following settings reflect the default values starting in **v1.20.0** of this chart. You only need to update them if using a **previous chart version** or if additional customization is required.
472+
473+
```yaml
474+
operator:
475+
admissionWebhooks:
476+
autoGenerateCert:
477+
enabled: true
478+
certPeriodDays: 3650
479+
certManager:
480+
enabled: false
481481
```
482482

483-
**Cert-Manager Logs:**
483+
- Setting `operator.admissionWebhooks.certManager.enabled` to `false` and `operator.admissionWebhooks.autoGenerateCert.enabled` to `true` ensures that Helm generates a self-signed TLS certificate.
484+
- Helm generates a self-signed certificate that is valid for 10 years (3650 days) and stores it in a secret for the Operator webhook. The certificate's validity period can be adjusted using `operator.admissionWebhooks.autoGenerateCert.certPeriodDays`.
485+
- The certificate is **automatically regenerated** on every Helm upgrade. To disable this behavior, set `operator.admissionWebhooks.autoGenerateCert.recreate` to `false`.
484486

485-
```bash
486-
kubectl logs -l app=certmanager
487-
kubectl logs -l app=cainjector
488-
kubectl logs -l app=webhook
489-
```
487+
---
490488

491-
#### Operator Issues
489+
#### 2. **Using a cert-manager Certificate**
492490

493-
##### Networking and Firewall Requirements
491+
Using `cert-manager` offers more control over certificate management and is more suitable for production environments. However, due to Helm’s install/upgrade order of operations, cert-manager CRDs and certificates cannot be installed within the same Helm operation. To work around this limitation, you can choose one of the following options:
494492

495-
Ensure the Mutating Webhook used by the operator for pod auto-instrumentation is not hindered by network policies or firewall rules. Key points to ensure:
493+
##### Option 1: **Pre-deploy cert-manager**
496494

497-
- **Webhook Accessibility**: The webhook must freely communicate with the cluster IP and the Kubernetes API server. Ensure network policies or firewall rules permit operator-related services to interact with these endpoints.
498-
- **Required Ports**: Policies should explicitly allow traffic to the necessary ports for seamless operation.
495+
If `cert-manager` is already deployed in your cluster, you can configure the operator to use it without enabling certificate generation by Helm.
499496

500-
Use the following command to identify the IP addresses and ports that need to be accessible:
497+
**Configuration:**
498+
```yaml
499+
operator:
500+
admissionWebhooks:
501+
certManager:
502+
enabled: true
503+
```
501504

502-
```bash
503-
kubectl get svc -n {operator_namespace}
504-
# Example output indicating necessary IP and port configurations:
505-
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
506-
# kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 10d
507-
# splunk-splunk-otel-collector-agent ClusterIP 10.0.176.113 <none> 8006/TCP,14250/TCP,14268/TCP,... 3d17h
508-
# splunk-splunk-otel-collector-operator ClusterIP 10.0.254.125 <none> 8443/TCP,8080/TCP 3d17h
509-
# splunk-splunk-otel-collector-operator-webhook ClusterIP 10.0.222.223 <none> 443/TCP 3d17h
505+
##### Option 2: **Deploy cert-manager and the operator together**
506+
507+
If you need to install `cert-manager` along with the operator, use a Helm post-install or post-upgrade hook to ensure that the certificate is created after cert-manager CRDs are installed.
508+
509+
**Configuration:**
510+
```yaml
511+
operator:
512+
admissionWebhooks:
513+
certManager:
514+
enabled: true
515+
certificateAnnotations:
516+
"helm.sh/hook": post-install,post-upgrade
517+
"helm.sh/hook-weight": "1"
518+
issuerAnnotations:
519+
"helm.sh/hook": post-install,post-upgrade
520+
"helm.sh/hook-weight": "1"
521+
certmanager:
522+
enabled: true
523+
installCRDs: true
510524
```
511525

512-
- **Configuration Action**: Adjust your network policies and firewall settings based on the service endpoints and ports listed by the command. This ensures the webhook and operator services can properly communicate within the cluster.
526+
This method is useful when installing `cert-manager` as a subchart or as part of a larger Helm chart installation.
513527

514-
#### Cert-Manager Issues
528+
---
515529

516-
If the operator seems to be hanging, it could be due to the cert-manager not auto-creating the required certificate. To troubleshoot:
530+
#### 3. **Using a Custom Externally Generated Certificate**
517531

518-
- Check the health and logs of the cert-manager pods for potential issues.
519-
- Consider restarting the cert-manager pods.
520-
- Ensure that your cluster has only one instance of cert-manager, which should include `certmanager`, `certmanager-cainjector`, and `certmanager-webhook`.
532+
For full control, you can use an externally generated certificate. This is suitable if you already have a certificate issued by a trusted CA or have specific security requirements.
521533

522-
For additional guidance, refer to the official cert-manager documentation:
523-
- [Troubleshooting Guide](https://cert-manager.io/docs/troubleshooting/)
524-
- [Uninstallation Guide](https://cert-manager.io/v1.2-docs/installation/uninstall/kubernetes/)
534+
**Configuration:**
535+
- Set both `operator.admissionWebhooks.certManager.enabled` and `operator.admissionWebhooks.autoGenerateCert.enabled` to `false`.
536+
- Provide the paths to your certificate (`certFile`), private key (`keyFile`), and CA certificate (`caFile`) in the values.
525537

526-
##### Validate Certificates
538+
**Example:**
539+
```yaml
540+
operator:
541+
admissionWebhooks:
542+
certManager:
543+
enabled: false
544+
autoGenerateCert:
545+
enabled: false
546+
certFile: /path/to/cert.crt
547+
keyFile: /path/to/cert.key
548+
caFile: /path/to/ca.crt
549+
```
550+
551+
This method allows you to use a certificate that is trusted by external systems, such as certificates issued by a corporate CA.
527552

528-
Ensure that the certificate, which the cert-manager creates and the operator utilizes, is available.
553+
---
554+
555+
For more advanced use cases, refer to the [official Helm chart documentation](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-operator/values.yaml) for detailed configuration options and scenarios.
556+
557+
### Troubleshooting the Operator and Cert Manager
558+
559+
#### Check the logs for failures
560+
561+
**Operator Logs:**
529562

530563
```bash
531-
kubectl get certificates
532-
# NAME READY SECRET AGE
533-
# splunk-otel-collector-operator-serving-cert True splunk-otel-collector-operator-controller-manager-service-cert 5m
564+
kubectl logs -l app.kubernetes.io/name=operator
534565
```
535566

536-
##### Using a Self-Signed Certificate for the Webhook
567+
**Cert-Manager Logs:**
537568

538-
The operator supports various methods for managing TLS certificates for the webhook. Below are the options available through the operator, with a brief description for each. For detailed configurations and specific use cases, please refer to the operator’s
539-
[official Helm chart documentation](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-operator/values.yaml)
569+
```bash
570+
kubectl logs -l app=certmanager
571+
kubectl logs -l app=cainjector
572+
kubectl logs -l app=webhook
573+
```
574+
575+
### Documentation Resources
540576

541-
**Note**: While using a self-signed certificate offers a quicker and simpler setup, it has limitations, such as not being trusted by default by clients.
542-
This may be acceptable for testing purposes or internal environments. For complete configurations and additional guidance, please refer to the provided link to the Helm chart documentation.
577+
- https://developers.redhat.com/devnation/tech-talks/using-opentelemetry-on-kubernetes
578+
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md
579+
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#instrumentation
580+
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md#opentelemetry-auto-instrumentation-injection
581+
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md#use-customized-or-vendor-instrumentation

examples/enable-operator-and-auto-instrumentation/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,15 @@ This example demonstrates how to:
2121
- **Single App Focus:** Explore trace-related performance of a single instrumented NodeJS application in the APM console.
2222
- **Simplified Use Case:** Although relations between applications will not be showcased in the APM console, this demo offers a simplified setup suitable for understanding basic instrumentation and trace visualization.
2323

24+
## [Simple Webserver - .NET Instrumentation](./otel-demo-nodejs.md)
25+
This example demonstrates how to:
26+
- Deploy the chart to the current namespace and the demo to the `dotnet-demo` namespace.
27+
- Instrument a single .NET application.
28+
29+
**Highlights:**
30+
- **Single App Focus:** Explore trace-related performance of a single instrumented .NET application in the APM console.
31+
- **Simplified Use Case:** Although relations between applications will not be showcased in the APM console, this demo offers a simplified setup suitable for understanding basic instrumentation and trace visualization.
32+
2433
## Exploring Traces and Applications in APM Console
2534
The examples provide practical insights into using the APM console for exploring application relations and traces.
2635
Whether dealing with multiple applications interacting with each other or focusing on a single application, you will gain hands-on experience in visualizing trace data using Splunk Observability APM.

examples/enable-operator-and-auto-instrumentation/enable-operator-and-auto-instrumentation-values.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,4 @@ operatorcrds:
1111
install: true
1212
operator:
1313
enabled: true
14-
certmanager:
15-
enabled: true
1614

0 commit comments

Comments
 (0)