Add support for awaiting proxy readiness #5967

kleimkuhler · 2021-03-29T16:41:07Z

What

This change adds the config.linkerd.io/proxy-await annotation which when set will delay application container start until the proxy is ready. This allows users to force application containers to wait for the proxy container to be ready without modifying the application's Docker image. This is different from the current use-case of linkerd-await which does require modifying the image.

To support this, Linkerd is using the fact that containers are started in the order that they appear in spec.containers. If linkerd-proxy is the first container, then it will be started first.

Kubernetes will start each container without waiting on the result of the previous container. However, if a container has a hook that is executed immediately after container creation, then Kubernetes will wait on the result of that hook before creating the next container. Using a PostStart hook in the linkerd-proxy container, the linkerd-await binary can be run and force Kubernetes to pause container creation until the proxy is ready. Once linkerd-await completes, the container hook completes and the application container is created.

Adding the config.linkerd.io/await-proxy annotation to a pod's metadata results in the linkerd-proxy container being the first container, as well as having the container hook:

postStart:
  exec:
    command:
    - /usr/lib/linkerd/linkerd-await

Update after draft

There has been some additional discussion both off GitHub as well as on this PR (specifically with @electrical).

First, we decided that this feature should be enabled by default. The reason for this is more often than not, this feature will prevent start-up ordering issues from occurring without having any negative effects on the application. Additionally, this will be a part of edges up until the 2.11 (the next stable release) and having it enabled by default will allow us to check that it does not conflict often with applications. Once we are closer to 2.11, we'll be able to determine if this should be disabled by default because it causes more issues than it prevents.

Second, this feature will remain configurable; if disabled, then upon injection the proxy container will not be made the first container in the pod manifest. This is important for the reasons discussed with @electrical about tools that make assumptions about app containers being the first container. For example, Rancher defaults to showing overview pages for the 0 index container, and if the proxy container was always 0 then this would defeat the purpose of the overview page.

Testing

To test this I used the sleep.sh script and changed Dockerfile-proxy to use it as it's ENTRYPOINT. This forces the container to sleep for 20 seconds before starting the proxy.

sleep.sh:

#!/bin/bash
echo "sleeping..."
sleep 20
/usr/bin/linkerd2-proxy-run

Dockerfile-proxy:

...
COPY sleep.sh /sleep.sh
RUN ["chmod", "+x", "/sleep.sh"]
ENTRYPOINT ["/sleep.sh"]

# Build and install with the above changes
$ bin/docker-build
...
$ bin/image-load --k3d
...
$ bin/linkerd install |kubectl apply -f -

Annotate the emoji deployment so that it's the only workload that should wait for it's proxy to be ready and inject it:

cat emojivoto.yaml |bin/linkerd inject - |kubectl apply -f -

You can then see that the emoji deployment is not starting its application container until the proxy is ready:

$ kubectl get -n emojivoto pods
NAME                        READY   STATUS            RESTARTS   AGE
voting-ff4c54b8d-sjlnz      1/2     Running           0          9s
emoji-f985459b4-7mkzt       0/2     PodInitializing   0          9s
web-5f86686c4d-djzrz        1/2     Running           0          9s
vote-bot-6d7677bb68-mv452   1/2     Running           0          9s

Signed-off-by: Kevin Leimkuhler [email protected]

Signed-off-by: Kevin Leimkuhler <[email protected]>

Dockerfile-proxy

Signed-off-by: Kevin Leimkuhler <[email protected]>

olix0r · 2021-03-30T21:12:56Z

wdyt about calling the annotation config.linkerd.io/proxy-await -- this is more consistent with the existing annotations like proxy-log-level, etc

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler · 2021-03-30T23:58:06Z

Values for the annotation have been changed to enabled or disabled. The annotation name has also been changed to config.linkerd.io/proxy-await.

I'll keep this draft until linkerd/linkerd-await#22 merges, but should be good for review after.

`CMD` is not required if the use-case of `linkerd-await` is only to wait for readiness but do nothing afterwards. In linkerd/linkerd2#5967 this is the case. `linkerd-await` is executed as a container hook and the only thing that it needs to do is prevent the hook from finishing until the proxy is ready. Once it is ready, it exits without running any additional commands. Signed-off-by: Kevin Leimkuhler <[email protected]>

Signed-off-by: Kevin Leimkuhler <[email protected]>

adleong

This is so cool!

adleong · 2021-03-31T21:47:07Z

Dockerfile-proxy

 RUN (proxy=$(bin/fetch-proxy $(cat proxy-version) $TARGETARCH) && \
    mv "$proxy" linkerd2-proxy)
+ARG LINKERD_AWAIT_VERSION=v0.2.3
+RUN curl -fsSvLo linkerd-await https://github.com/linkerd/linkerd-await/releases/download/release%2F${LINKERD_AWAIT_VERSION}/linkerd-await-${LINKERD_AWAIT_VERSION}-${TARGETARCH} && chmod +x linkerd-await


Would it be better to have linkerd-await publish a docker image and then pull the binary from the docker image instead of from the releases page?

charts/patch/templates/patch.json

kleimkuhler · 2021-04-02T13:24:56Z

Converting back to draft after some additional discussion off GH yesterday. I'll be removing the configuration options for this so that it is an always-enabled feature.

electrical · 2021-04-07T00:29:09Z

Making the linkerd-proxy container the first (0) image might confuse people and potentially break behaviour for others.
Some tools might depend on the index of the image list for example.
Also in some UI's it shows the image value of the first container in the overview which then would show linkerd-proxy everywhere :-)

adleong · 2021-04-07T21:00:53Z

Thanks @electrical! Are there any specific examples you can point to of behaviors or UIs which rely on container ordering and would be negatively impacted by moving the linkerd-proxy container to index 0?

electrical · 2021-04-07T21:58:40Z

@adleong Rancher is a good example where in the different overview pages ( Pod, deployment, statefulsets, etc ) it shows the index 0 container. If it would show the linkerd-proxy container there I specifically need to go into the details of that item to see details about the container I actually care about and thus defeating the purpose of that overview.

In the case of other systems like Kyverno i assume with my own deployments that my container is in index 0 and do certain mutations with that. ( it can only do it based on index, not name )

These are the only 2 examples that I have / work with. Not sure if there are other systems that depend on the ordering.

kleimkuhler · 2021-04-08T02:04:16Z

@electrical Thanks this is helpful! So those examples are assuming the index of the container, but not actually requiring it to be first.

For the Rancher case, I assume there is some way to configure it to either look at a different index or look by container name. For the Kyverno case, this again sounds like a scenario where looking by container name is a safer thing to do.

I say this because I think there is an important distinction between applications that require being the first container, and applications that are only assuming so. For this feature to work, the proxy requires that it is the first container.

As I stated above, right now the plan is to make this feature always-enabled and not configurable. Do you think if this was always-enabled but you could configure it, that you'd find yourself disabling it for those examples you listed?

If so, I think this may be a good reason to at least make this configurable in the first iteration. As it stays in edges though and we move closer to a next stable, maybe that will give more time for changes to applications that make these assumptions.

electrical · 2021-04-08T08:38:56Z

@kleimkuhler In my case I would disable it yeah. Partially for the fact the linkerd-proxy becomes index 0 but also I'm not sure linkerd should do this.
In my own applications for example everything works fine except for specific apps that need to wait for the linkerd-proxy but I solve that with adding linkerd-await in my containers. Since it's only an issue for a small use case ( as far as I know ) do you really want such a major change?
I'm not specifically against it because I understand the reasoning to improve user experience, but not being able to disable this behaviour would make it a negative experience for me.

kleimkuhler · 2021-04-08T12:32:10Z

@electrical Yep that makes sense. The biggest case that is solved by making it always enabled is when users have some external image that they cannot—or do not wish to—build themselves so that it is wrapped by linkerd-await. Even then, the container may not actually need linkerd-await, but it shouldn't affect anything if it does use it.

If it is kept configurable, the other question is if it is enabled or disabled by default. For edges it's probably better to make it enabled by default. That way, we have more a chance of running into cases where this does affect a broader set of applications. Once we get closer to a stable and have a better idea about how helpful this feature is, we can make the call on what its default behavior is in the stable.

electrical · 2021-04-08T13:49:01Z

@kleimkuhler I completely understand and happy with the path you set out :-)

Signed-off-by: Kevin Leimkuhler <[email protected]>

…wait Signed-off-by: Kevin Leimkuhler <[email protected]>

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler · 2021-04-13T20:01:07Z

This comment has been copied up into the PR description

There has been some additional discussion both off GitHub as well as on this PR (specifically with @electrical).

First, we decided that this feature should be enabled by default. The reason for this is more often than not, this feature will prevent start-up ordering issues from occurring without having any negative effects on the application. Additionally, this will be a part of edges up until the 2.11 (the next stable release) and having it enabled by default will allow us to check that it does not conflict often with applications. Once we are closer to 2.11, we'll be able to determine if this should be disabled by default because it causes more issues than it prevents.

Second, this feature will remain configurable; if disabled, then upon injection the proxy container will not be made the first container in the pod manifest. This is important for the reasons discussed with @electrical about tools that make assumptions about app containers being the first container. For example, Rancher defaults to showing overview pages for the 0 index container, and if the proxy container was always 0 then this would defeat the purpose of the overview page.

kleimkuhler · 2021-04-13T20:06:09Z

charts/linkerd2/templates/controller.yaml

+      {{- $r := merge .Values.publicAPIProxyResources .Values.proxy.resources }}
+      {{- $_ := set $tree.Values.proxy "resources" $r }}
+      {{- end }}
+      {{- $_ := set $tree.Values.proxy "await" true }}


Because core components are not admitted by the proxy-injector, we cannot rely on annotating these components with config.linkerd.io/proxy-await: "enabled".

Therefore, the template must override Values here so that proxy.await is always true. This ensures that when users install Linkerd and explicitly disable this feature for their application, the control plane still has the feature enabled.

This is true for templates/{destination.yaml, proxy-injector.yaml, sp-validator.yaml}.

charts/linkerd2/values.yaml

Dockerfile-proxy

kleimkuhler · 2021-04-13T21:36:57Z

What would be super cool is if @mateiidavid's #6002 merges before this. That way, the Viz and Jaeger extensions that currently have the config.linkerd.io/proxy-await annotation on all their deployments could change to just a single annotation on their namespace.

alpeb

Awesome! 👍

TIOLI: I noted the identity workload's proxy container will have the post-start hook. The main container will be triggered first, so in theory linkerd-await shouldn't block anything. But it might be worth adding {{- $_ := set $tree.Values.proxy "await" false }} in that case before calling the proxy partial, just to avoid the unnecessary call 🤷‍♂️

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler · 2021-04-14T14:10:32Z

@alpeb Good call; it's safer to explicitly disable the hook rather than relying off the container ordering. It has been added.

Signed-off-by: Kevin Leimkuhler <[email protected]>

adleong

Minor mismatch with the annotation boolean format. Otherwise looks good!

cli/cmd/inject.go

Signed-off-by: Kevin Leimkuhler <[email protected]>

…wait Signed-off-by: Kevin Leimkuhler <[email protected]>

Signed-off-by: Kevin Leimkuhler <[email protected]>

Add support for await-proxy annotation

03fe3bf

Signed-off-by: Kevin Leimkuhler <[email protected]>

nesl247 reviewed Mar 29, 2021

View reviewed changes

Dockerfile-proxy Outdated Show resolved Hide resolved

Dockerfile-proxy Outdated Show resolved Hide resolved

Use fetch image for getting linkerd-await

a8ad2c2

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler mentioned this pull request Mar 29, 2021

Make command optional linkerd/linkerd-await#22

Closed

kleimkuhler added 3 commits March 30, 2021 22:06

Use proxy.await and accept values of enabled and disabled

9a38984

Signed-off-by: Kevin Leimkuhler <[email protected]>

Fix check

eecd235

Signed-off-by: Kevin Leimkuhler <[email protected]>

Change annotation to proxy-await

1980817

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler mentioned this pull request Mar 31, 2021

Make CMD optional linkerd/linkerd-await#23

Merged

Update linkerd-await location

3a0bf08

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler marked this pull request as ready for review March 31, 2021 20:19

kleimkuhler requested a review from a team as a code owner March 31, 2021 20:19

adleong approved these changes Mar 31, 2021

View reviewed changes

kleimkuhler marked this pull request as draft April 2, 2021 13:24

kleimkuhler added 3 commits April 13, 2021 15:25

Enable proxy.await by default

ee72568

Signed-off-by: Kevin Leimkuhler <[email protected]>

Merge remote-tracking branch 'origin/main' into kleimkuhler/linkerd-a…

4f34478

…wait Signed-off-by: Kevin Leimkuhler <[email protected]>

Enable proxy-await in Viz and Jaeger extensions

11cce6f

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler marked this pull request as ready for review April 13, 2021 20:00

kleimkuhler requested a review from adleong April 13, 2021 20:01

kleimkuhler self-assigned this Apr 13, 2021

kleimkuhler added this to the stable-2.11.0 milestone Apr 13, 2021

kleimkuhler commented Apr 13, 2021

View reviewed changes

alpeb reviewed Apr 13, 2021

View reviewed changes

Dockerfile-proxy Show resolved Hide resolved

alpeb approved these changes Apr 13, 2021

View reviewed changes

Explicitly disable proxy.await for the identity deployment

b038d3f

Signed-off-by: Kevin Leimkuhler <[email protected]>

Fix tests after removing identity postStart hook

01ec9af

Signed-off-by: Kevin Leimkuhler <[email protected]>

adleong reviewed Apr 16, 2021

View reviewed changes

cli/cmd/inject.go Outdated Show resolved Hide resolved

kleimkuhler added 2 commits April 19, 2021 19:55

Fix proxy.Await value to be enabled/disabled

6bf4871

Signed-off-by: Kevin Leimkuhler <[email protected]>

Merge remote-tracking branch 'origin/main' into kleimkuhler/linkerd-a…

9e2162e

…wait Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler requested a review from adleong April 19, 2021 20:32

kleimkuhler added 3 commits April 21, 2021 16:24

Merge remote-tracking branch 'origin/main' into kleimkuhler/linkerd-a…

3c39732

…wait Signed-off-by: Kevin Leimkuhler <[email protected]>

Use namespace annotation

157bdd1

Signed-off-by: Kevin Leimkuhler <[email protected]>

Fix test and injection

08da65a

Signed-off-by: Kevin Leimkuhler <[email protected]>

adleong approved these changes Apr 21, 2021

View reviewed changes

kleimkuhler merged commit 1071ec2 into main Apr 21, 2021

kleimkuhler deleted the kleimkuhler/linkerd-await branch April 21, 2021 21:43

kleimkuhler mentioned this pull request Apr 30, 2021

add more opaque/skip ports docs, and a protocol table linkerd/website#1057

Merged

kleimkuhler mentioned this pull request Oct 20, 2021

Linkerd Proxy is now the default container on all pods since 2.11.0 #7122

Closed

mateiidavid mentioned this pull request Jun 22, 2022

Linkerd-Proxy container starts too slow #8700

Closed

edwardgronroos mentioned this pull request Jun 23, 2022

feat: add support for default container annotation (#8015) argoproj/argo-cd#9769

Merged

10 tasks

mateiidavid mentioned this pull request Jul 7, 2022

Allow disabling linkerd-await on control plane pods #8739

Closed

Add support for awaiting proxy readiness #5967

Add support for awaiting proxy readiness #5967

Uh oh!

Conversation

kleimkuhler commented Mar 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Update after draft

Testing

Uh oh!

Uh oh!

Uh oh!

olix0r commented Mar 30, 2021

Uh oh!

kleimkuhler commented Mar 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adleong left a comment

Choose a reason for hiding this comment

Uh oh!

adleong Mar 31, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kleimkuhler commented Apr 2, 2021

Uh oh!

electrical commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adleong commented Apr 7, 2021

Uh oh!

electrical commented Apr 7, 2021

Uh oh!

kleimkuhler commented Apr 8, 2021

Uh oh!

electrical commented Apr 8, 2021

Uh oh!

kleimkuhler commented Apr 8, 2021

Uh oh!

electrical commented Apr 8, 2021

Uh oh!

kleimkuhler commented Apr 13, 2021

Uh oh!

kleimkuhler Apr 13, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kleimkuhler commented Apr 13, 2021

Uh oh!

alpeb left a comment

Choose a reason for hiding this comment

Uh oh!

kleimkuhler commented Apr 14, 2021

Uh oh!

adleong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kleimkuhler commented Mar 29, 2021 •

edited

Loading

kleimkuhler commented Mar 30, 2021 •

edited

Loading

electrical commented Apr 7, 2021 •

edited

Loading