Skip to content

Conversation

@alpeb
Copy link
Member

@alpeb alpeb commented Sep 8, 2025

Fixes #13865 (see for repro).

Proxy clients targetting a linkerd-admin port on a target meshed in native-sidecar mode weren't getting notifications about changes in the target pod.

Summary of case detailed in #13865:

  • Target pod using default-deny policy.
  • The Prometheus client opened a persistent connection to the target when it wasn't ready yet, and thus not yet admitted into the mesh, so the connection wasn't mTLS'd, resulting in scrape requests failing with 403.
  • After the pod became ready, the client wasn't notified and thus the bad state persisted.

The fix consists on having the workload watcher also account for subscriptions associated to ports in init-containers (where the linkerd-admin port would be located when using native sidecars).

Two new unit tests were added, making sure pod phase state changes are notified to subscribers both when using non-native-sidecar and native-sidecar. The native-sidecar case failed without this fix.

…in port in native-sidecar mode

Fixes #13865 (see for repro).

Proxy clients targetting a linkerd-admin port on a target meshed in
native-sidecar mode weren't getting notifications about changes in the
target pod.

Summary of case detailed in #13865:

* Target pod using default-deny policy.
* The Prometheus client opened a persistent connection to the target when it
  wasn't ready yet, and thus not yet admitted into the mesh, so the
  connection wasn't mTLS'd, resulting in scrape requests failing with 403.
* After the pod became ready, the client wasn't notified and thus the
  bad state persisted.

The fix consists on having the workload watcher also account for
subscriptions associated to ports in init-containers (where the
linkerd-admin port would be located when using native sidecars).

Two new unit tests were added, making sure pod phase state changes are
notified to subscribers both when using non-native-sidecar and
native-sidecar. The native-sidecar case failed without this fix.
@alpeb alpeb requested a review from a team as a code owner September 8, 2025 15:32
@bkittinger
Copy link

Thanks, @alpeb - looking forward to the merge. However, I had the hunch that this very same issue (not looking at the initContainers) might be hiding in other places as well - and indeed it does.

I’ll check in other places of the Codename back at my desk tomorrow (only on mobile today) but for sure it is happening again in line 781 of controller/api/destination/watcher/workload_watcher.go. That also explains why @micke-post and I could not get viz working with named pods in our setup and had to fall back to port numbers.

@alpeb
Copy link
Member Author

alpeb commented Sep 9, 2025

Yeah, we need to audit all controllers for issues like this. Good catch about GetAnnotatedOpaquePorts; but it's unlikely that's the issue you were hitting as that deals with opaque ports, which are more for the main containers, not the proxy deployed as an init container. Your issue with named ports was likely the one I mentioned before, #14103

@alpeb alpeb merged commit 856f1a5 into main Sep 18, 2025
39 checks passed
@alpeb alpeb deleted the alpeb/dst-native-sidecar-fixup branch September 18, 2025 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sporadic HTTP 403 Response Codes Due to TLS Handshake Failures

4 participants