-
Notifications
You must be signed in to change notification settings - Fork 10
feat(t8s-cluster/cilium): enable kubeProxy replacement #1815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(t8s-cluster/cilium): enable kubeProxy replacement #1815
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughMerge chart Values into helper context and gate multiple templates on Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as Developer / Values
participant Helm as Helm renderer
participant Templates as Chart templates / helpers
participant K8s as Kubernetes API
participant OpenStack as OpenStack API
participant Cilium as Cilium HelmRelease
Dev->>Helm: render chart with .Values.controlPlane.hosted=true
Helm->>Templates: merge .Values into helper context
Templates->>K8s: render ControllerManager args (--allocate-node-cidrs=true)
Templates->>K8s: render HostedControlPlane spec (kubeProxy.disabled=true)
Templates->>OpenStack: emit security-group rules & machine allowedAddressPairs (hosted native-routing)
Templates->>Cilium: inject hosted-mode Cilium HelmRelease values (BPF, native routing, IPAM, proxy-replacement)
K8s-->>Dev: resources created/updated
OpenStack-->>Dev: SGs/ports applied
Cilium-->>Dev: HelmRelease deployed
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-07-24T09:56:41.380ZApplied to files:
📚 Learning: 2025-07-24T09:55:53.655ZApplied to files:
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @cwrau, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on integrating and enabling Cilium's advanced networking capabilities, specifically its kube-proxy replacement feature, within Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables Cilium's kube-proxy replacement for hosted control planes. The changes are generally well-aligned with this goal, correctly enabling various Cilium features and disabling the standard kube-proxy when a hosted control plane is used. However, I've identified a few issues. There's a critical issue in cni-cilium.yaml that would lead to invalid configuration values for ipam.mode and routingMode, likely causing deployment failures. I've also found a high-severity issue with a Helm lookup that lacks robustness and could lead to reconciliation failures. Additionally, there are several instances of hardcoded network CIDRs across multiple files, which I've flagged as medium-severity issues that impact maintainability. I've provided specific suggestions to address these points.
.../management-cluster/clusterClass/openStackClusterTemplate/_openStackClusterTemplateSpec.yaml
Show resolved
Hide resolved
...management-cluster/clusterClass/openStackMachineTemplates/_openstackMachineTemplateSpec.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enables Cilium's kubeProxy replacement feature for hosted control plane configurations. When controlPlane.hosted is true, the workload cluster will use Cilium's eBPF-based kube-proxy replacement with native routing mode instead of the traditional kube-proxy. This requires coordinated changes across CNI configuration, network security rules, OpenStack machine configuration, and control plane settings.
Key changes:
- Enables multiple Cilium features (BPF masquerade, native routing, kubeProxy replacement) conditionally based on hosted control plane mode
- Configures security group rules to allow native pod-to-pod routing in OpenStack
- Disables kube-proxy in hosted control plane deployments and enables node CIDR allocation in controller manager
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
cni-cilium.yaml |
Enables kubeProxy replacement and native routing features in Cilium when using hosted control planes |
_openstackMachineTemplateSpec.yaml |
Adds allowed address pairs for pod CIDR to enable native routing at OpenStack port level |
_openStackClusterTemplateSpec.yaml |
Adds security group rules for pod-pod and node-pod native routing; refactors protocol handling |
_hostedControlPlaneTemplateSpec.yaml |
Disables kube-proxy in hosted control plane configuration |
_helpers.tpl |
Enables node CIDR allocation in controller manager for hosted control planes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...management-cluster/clusterClass/openStackMachineTemplates/_openstackMachineTemplateSpec.yaml
Show resolved
Hide resolved
.../management-cluster/clusterClass/openStackClusterTemplate/_openStackClusterTemplateSpec.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
charts/t8s-cluster/templates/management-cluster/clusterClass/_helpers.tpl(1 hunks)charts/t8s-cluster/templates/management-cluster/clusterClass/hostedControlPlaneTemplate/_hostedControlPlaneTemplateSpec.yaml(1 hunks)charts/t8s-cluster/templates/management-cluster/clusterClass/openStackClusterTemplate/_openStackClusterTemplateSpec.yaml(2 hunks)charts/t8s-cluster/templates/management-cluster/clusterClass/openStackMachineTemplates/_openstackMachineTemplateSpec.yaml(1 hunks)charts/t8s-cluster/templates/workload-cluster/cni-cilium.yaml(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-24T09:55:53.655Z
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1601
File: charts/base-cluster/templates/dns/external-dns.yaml:30-32
Timestamp: 2025-07-24T09:55:53.655Z
Learning: In charts/base-cluster/templates/dns/external-dns.yaml, the dns.provider field in values.yaml has always been expected to be a map format (e.g., `{ cloudflare: {} }`), never a string format. The template correctly uses `{{ .Values.dns.provider | keys | first }}` to extract the provider name from the map keys.
Applied to files:
charts/t8s-cluster/templates/management-cluster/clusterClass/_helpers.tpl
🪛 GitHub Actions: Lint Helm Charts
charts/t8s-cluster/templates/workload-cluster/cni-cilium.yaml
[error] 53-53: Chart lint failed due to template error: nil pointer evaluating interface {}.listeners (template: t8s-cluster/templates/workload-cluster/cni-cilium.yaml:53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: check licenses
- GitHub Check: Agent
🔇 Additional comments (7)
charts/t8s-cluster/templates/management-cluster/clusterClass/hostedControlPlaneTemplate/_hostedControlPlaneTemplateSpec.yaml (1)
68-69: LGTM!The kubeProxy disablement is correctly tied to the hosted control plane flag, which aligns with enabling Cilium's kubeProxyReplacement feature in hosted mode.
charts/t8s-cluster/templates/management-cluster/clusterClass/_helpers.tpl (1)
148-154: LGTM!The addition of
allocate-node-cidrsfor hosted control planes is correct. The controller manager needs to allocate node CIDRs when running in hosted mode with native routing.charts/t8s-cluster/templates/management-cluster/clusterClass/openStackClusterTemplate/_openStackClusterTemplateSpec.yaml (1)
110-118: LGTM - Enhanced remote target handling.The expanded logic correctly handles
remoteGroupID,remoteManagedGroups, andremoteIPPrefixin priority order, with a sensible fallback toallRemoteManagedGroups. This supports the new hosted mode security rules.charts/t8s-cluster/templates/workload-cluster/cni-cilium.yaml (3)
41-41: kubeProxyReplacement requires coordination with kubeadm.The comment on line 40 mentions that kubeProxyReplacement "currently not really possible with kubeadmcontrolplane" because they don't support disabling the built-in kube-proxy. However, this PR sets
kubeProxy.disabledin the hosted control plane template (line 69 of_hostedControlPlaneTemplateSpec.yaml).Verify that:
- The hosted control plane setup actually supports disabling kube-proxy
- The
kubeProxy.disabledflag in the hosted template is sufficient- No additional configuration is needed to prevent kube-proxy from being deployed
82-87: LGTM - Hubble export configuration.Enabling static export to stdout with a filter for DROPPED and ERROR verdicts is a good observability practice for debugging network issues.
31-49: Based on my verification through web search and codebase analysis, I can now provide the rewritten review comment:Verify kernel version and configuration requirements for hosted mode Cilium features.
When
controlPlane.hostedis enabled, multiple advanced Cilium BPF features are simultaneously activated (masquerade, tproxy, TCX, endpoint routes, bandwidth manager, egressGateway, kubeProxyReplacement, native routing with direct node routes, and local redirect policies). This creates significant kernel dependencies:
- TCX requires Linux kernel 6.6+ (enabled by enableTCX when hosted=true)
- Bandwidth Manager requires kernel >= 5.18 with CONFIG_NET_SCH_FQ
- TPROXY requires CONFIG_NETFILTER_XT_TARGET_TPROXY and related netfilter configs
- IPAM mode is set to "kubernetes" (appropriate for hosted control planes)
Since this chart targets OpenStack deployments (evidenced by cloud-provider-openstack integration), verify that your compute nodes support kernel 6.6+ and the required BPF/netfilter kernel configurations before enabling hosted mode.
charts/t8s-cluster/templates/management-cluster/clusterClass/openStackMachineTemplates/_openstackMachineTemplateSpec.yaml (1)
8-12: The allowedAddressPairs configuration for 10.0.0.0/16 is correctly applied to all machine templates in hosted mode. This CIDR represents the pod network, as confirmed by the CNI configuration (ipv4NativeRoutingCIDR: 10.0.0.0/16in cni-cilium.yaml) and the security group rules in openStackClusterTemplate. All nodes—both control-plane and compute-plane—must allow traffic destined for pod IPs, making this configuration appropriate for both node types.
.../management-cluster/clusterClass/openStackClusterTemplate/_openStackClusterTemplateSpec.yaml
Show resolved
Hide resolved
1cb12fe to
9827acd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
charts/t8s-cluster/templates/management-cluster/clusterClass/openStackClusterTemplate/_openStackClusterTemplateSpec.yaml (1)
55-59: Fix protocol nil handling in hosted CNI security group rules.Setting
protocol: nilin the rule dict combined with the hasKey ternary at line 104 causes the protocol to remainnilinstead of defaulting to"tcp". OpenStack security group rules require an explicit protocol, so this will fail.Apply one of these fixes:
Option A (recommended): Omit the protocol key and let line 104 default to "tcp"
- {{- $cniSecurityGroupRules = set $cniSecurityGroupRules "allow pod-pod native routing (ingress)" (dict "remoteIPPrefix" "10.0.0.0/16" "protocol" nil) -}} - {{- $cniSecurityGroupRules = set $cniSecurityGroupRules "allow node-pod native routing (ingress)" (dict "remoteIPPrefix" "10.6.0.0/16" "protocol" nil) -}} + {{- $cniSecurityGroupRules = set $cniSecurityGroupRules "allow pod-pod native routing (ingress)" (dict "remoteIPPrefix" "10.0.0.0/16") -}} + {{- $cniSecurityGroupRules = set $cniSecurityGroupRules "allow node-pod native routing (ingress)" (dict "remoteIPPrefix" "10.6.0.0/16") -}}Option B: Explicitly set protocol to "tcp"
- {{- $cniSecurityGroupRules = set $cniSecurityGroupRules "allow pod-pod native routing (ingress)" (dict "remoteIPPrefix" "10.0.0.0/16" "protocol" nil) -}} - {{- $cniSecurityGroupRules = set $cniSecurityGroupRules "allow node-pod native routing (ingress)" (dict "remoteIPPrefix" "10.6.0.0/16" "protocol" nil) -}} + {{- $cniSecurityGroupRules = set $cniSecurityGroupRules "allow pod-pod native routing (ingress)" (dict "remoteIPPrefix" "10.0.0.0/16" "protocol" "tcp") -}} + {{- $cniSecurityGroupRules = set $cniSecurityGroupRules "allow node-pod native routing (ingress)" (dict "remoteIPPrefix" "10.6.0.0/16" "protocol" "tcp") -}}charts/t8s-cluster/templates/workload-cluster/cni-cilium.yaml (1)
42-42: Fix kubeProxyReplacement value type.
kubeProxyReplacement: truewill output the string"true"in YAML, but Cilium expects one of the strings"strict","probe","disabled", or"false". The string"true"is not a valid kubeProxyReplacement mode and will cause the Helm release to fail.Based on the comment at line 41 indicating kubeadm control planes don't support disabling kube-proxy, the intent is likely to use full proxy replacement in hosted mode. Change this to:
- kubeProxyReplacement: true + kubeProxyReplacement: strictIf you need conditional behavior in non-hosted mode, use
nil(which omits the key and lets Cilium use its default):+ {{- if .Values.controlPlane.hosted }} kubeProxyReplacement: strict + {{- end }}
🧹 Nitpick comments (1)
charts/t8s-cluster/templates/workload-cluster/cni-cilium.yaml (1)
59-59: Consider externalizing hardcoded CIDR to values for flexibility.The
ipv4NativeRoutingCIDR: 10.0.0.0/16is hardcoded. This same CIDR also appears in the OpenStack security group rules (file_openStackClusterTemplateSpec.yaml, lines 56 and 133). Hardcoding reduces chart flexibility and makes it harder to support custom pod CIDR configurations.Consider adding a values key like
.Values.network.ipv4NativeRoutingCIDRand referencing it consistently across templates:- ipv4NativeRoutingCIDR: 10.0.0.0/16 # default net, see hosted control plane controller + ipv4NativeRoutingCIDR: {{ .Values.network.ipv4NativeRoutingCIDR | default "10.0.0.0/16" }}This would improve reusability and simplify future CIDR changes across the chart.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
charts/t8s-cluster/templates/management-cluster/clusterClass/_helpers.tpl(1 hunks)charts/t8s-cluster/templates/management-cluster/clusterClass/hostedControlPlaneTemplate/_hostedControlPlaneTemplateSpec.yaml(1 hunks)charts/t8s-cluster/templates/management-cluster/clusterClass/openStackClusterTemplate/_openStackClusterTemplateSpec.yaml(2 hunks)charts/t8s-cluster/templates/management-cluster/clusterClass/openStackMachineTemplates/_openstackMachineTemplateSpec.yaml(1 hunks)charts/t8s-cluster/templates/workload-cluster/cni-cilium.yaml(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- charts/t8s-cluster/templates/management-cluster/clusterClass/_helpers.tpl
- charts/t8s-cluster/templates/management-cluster/clusterClass/hostedControlPlaneTemplate/_hostedControlPlaneTemplateSpec.yaml
- charts/t8s-cluster/templates/management-cluster/clusterClass/openStackMachineTemplates/_openstackMachineTemplateSpec.yaml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-24T09:56:41.380Z
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1601
File: charts/base-cluster/templates/dns/external-dns.yaml:33-39
Timestamp: 2025-07-24T09:56:41.380Z
Learning: In the teutonet-helm-charts base-cluster chart, secret names like "external-dns" for Cloudflare provider are intentionally hard-coded. Users who need custom secret names should use Helm's `valuesFrom` feature to override values rather than expecting dedicated fields in values.yaml. This design keeps the values.yaml clean while still allowing full customization flexibility.
Applied to files:
charts/t8s-cluster/templates/management-cluster/clusterClass/openStackClusterTemplate/_openStackClusterTemplateSpec.yaml
📚 Learning: 2025-07-24T09:55:53.655Z
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1601
File: charts/base-cluster/templates/dns/external-dns.yaml:30-32
Timestamp: 2025-07-24T09:55:53.655Z
Learning: In charts/base-cluster/templates/dns/external-dns.yaml, the dns.provider field in values.yaml has always been expected to be a map format (e.g., `{ cloudflare: {} }`), never a string format. The template correctly uses `{{ .Values.dns.provider | keys | first }}` to extract the provider name from the map keys.
Applied to files:
charts/t8s-cluster/templates/workload-cluster/cni-cilium.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint helm chart (t8s-cluster)
🔇 Additional comments (4)
charts/t8s-cluster/templates/management-cluster/clusterClass/openStackClusterTemplate/_openStackClusterTemplateSpec.yaml (1)
110-118: Remote target selection logic looks correct.The fallback chain (remoteGroupID → remoteManagedGroups → remoteIPPrefix → allRemoteManagedGroups) is sound and properly routes the new hosted rules through the remoteIPPrefix branch.
charts/t8s-cluster/templates/workload-cluster/cni-cilium.yaml (3)
52-57: Gateway lookup nil handling is robust.The implementation properly checks for a nil Gateway object and fails with a clear error message rather than dereferencing nil. This addresses the past pipeline failure from the lookup function.
66-68: Envoy and Hubble export configurations look good.The additions for
envoy.prometheus.enabledandhubble.export.staticare valid Cilium observability features with correct configuration structure.Also applies to: 86-91
44-47: IPAM and routing configuration is appropriate for hosted mode.The settings for
ipam.mode: kubernetes,routingMode: native, and direct node routing are correct for hosted control plane scenarios where native routing and Kubernetes-allocated CIDRs are desired.
9827acd to
7ac38ed
Compare
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.