Inconsistent handling of container failure to start

This is identified during investigation of #4289 

The containerVM in question is not setting the `session.started` flag and as such the property collector times out. The reason why it's not being set is unimportant in this issue.
The problem is how we handle that failure - we Unbind the network config, but we do not power down the container. This means that if the container process didn't start successfully we have a useless containerVM consuming resource. If it was a network hicup/VC queuing/other control plane issue, then we've just disconnected a functioning containerVM from the network it required.

Solutions:
1. Do not unbind on the error path after power on has been confirmed
2. Ensure that the VM is powered down if it fails to report status by the deadline

In either case, the Unbind can be triggered by the VM power off instead of explciitly.

Notes:
This impacts cVMs that start after the current 3min timeout has expired. The timeout for cVM start was added to address failure scenarios and because `docker run -it` inherits the awkward blocking behaviour of the standard docker client when attach is used (interception of Ctrl-C, et al) meaning it cannot be easily escaped. The correct solution to this is:
1. address the problem behaviour in docker client that results in timeouts being used
2. ensure that cVMs either come up cleanly or report failure and shut themselves down
3. only unbind network addresses on power off

This likely means changing the power state operations to be async and then waiting on events (either the expected status change or an error). I've upated the estimate to encompass a possible shift to async power operations but doesn't not include raising a PR for the signal forwarding behaviour of docker client.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent handling of container failure to start #4294

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent handling of container failure to start #4294

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions