Skip to content

Conversation

kathap
Copy link
Contributor

@kathap kathap commented Mar 27, 2025

cc-uploader previously lacked a proper draining mechanism. With PR #195, draining is now implemented, allowing upload jobs to complete gracefully before the process is stopped.

In addition, PR #4296 introduces support in the Cloud Controller API VM's drain script to coordinate the shutdown of the cc-uploader process.

To ensure proper shutdown ordering, this change:

  • Adds shutdown_cc_uploader before shutting down nginx, cloud controller and local workers to the cloud_controller_ng shutdown script.
  • Introduces a Monit dependency where cc_uploader depends on cloud_controller_ng.

This ensures that cc-uploader, which polls job status from cloud_controller_ng, is drained before cloud_controller_ng or nginx_cc are stopped.

Dependency Behavior

  • Start:

cc_uploader starts only after cloud_controller_ng is running.

  • Stop:

Stopping cloud_controller_ng also stops cc_uploader.
Stopping cc_uploader does not affect cloud_controller_ng.

  • Restart:

Restarting cloud_controller_ng may also restart cc_uploader.
cc_uploader can restart independently, as long as cloud_controller_ng is available.

  • A short explanation of the proposed change:
    This change ensures that cc-uploader is gracefully drained before cloud_controller_ng or nginx_cc are stopped. It adds a shutdown step for cc-uploader in the API VM drain script and introduces a Monit dependency so that cc-uploader only runs when cloud_controller_ng is running.

  • An explanation of the use cases your change solves
    Graceful Shutdown of cc-uploader:
    Ensures ongoing app upload jobs have a chance to complete before cc-uploader is stopped, preventing potential job interruption or failure during deployments or VM shutdowns.
    Correct Shutdown Order:
    Prevents scenarios where cloud_controller_ng is stopped before cc-uploader, which would break the polling mechanism and cause uploads to fail.
    Improved System Reliability During Drains:
    By hooking cc-uploader into the API VM drain process and setting up the right Monit dependency, we make sure everything shuts down in the right order. This helps avoid weird edge cases during restarts or shutdowns and makes the system behave more consistently.

  • Links to any other associated PRs
    More resilient droplet upload cc-uploader#195
    Add draining for cc uploader cloud_controller_ng#4296

  • I have viewed signed and have submitted the Contributor License Agreement

  • I have made this pull request to the develop branch

  • I have run CF Acceptance Tests on bosh lite

@kathap kathap marked this pull request as draft March 27, 2025 10:06
@kathap kathap mentioned this pull request Apr 7, 2025
3 tasks
@kathap kathap merged commit 45d2a21 into develop Apr 15, 2025
2 checks passed
@moleske moleske deleted the add-draining-to-cc-uploader branch April 15, 2025 15:17
Samze added a commit that referenced this pull request May 7, 2025
This reverts commit 45d2a21.

This original commit introduces the requirement that cc_uploader is
co-located on the same VM as the cloud_controller_ng job. While this is
the case in cf-deployment, this is not the case for all deployments of
CF (like the VMware one).

As this commit has broken our deployment, and its not easy for us simply
to relocate the job, we propose we revert this and discuss options going forwards.

The cc_uploader originally was designed an independent process, typically processes
that require to be co-located are within the same job.
E.g. nginx, cloud_controller_ng and cloud_controller_local_worker.

Some options:
* Investigate a way to generically drain uploads from any source in
  nginx rather than specifically in cc_uploader.
    * Current timeout is [10 seconds](https://github.com/cloudfoundry/cloud_controller_ng/blob/5c4dac049bd28979284aeab1efa48c7075676131/lib/cloud_controller/drain.rb#L9)
* Find a way to sync draining between jobs without requiring
  co-location.
* Keep the co-location requirement for draining but behind a capi-property.
Samze added a commit that referenced this pull request May 8, 2025
This reverts commit 45d2a21.

This original commit introduces the requirement that cc_uploader is
co-located on the same VM as the cloud_controller_ng job. While this is
the case in cf-deployment, this is not the case for all deployments of
CF (like the VMware one).

As this commit has broken our deployment, and its not easy for us simply
to relocate the job, we propose we revert this and discuss options going forwards.

The cc_uploader originally was designed an independent process, typically processes
that require to be co-located are within the same job.
E.g. nginx, cloud_controller_ng and cloud_controller_local_worker.

Some options:
* Investigate a way to generically drain uploads from any source in
  nginx rather than specifically in cc_uploader.
    * Current timeout is [10 seconds](https://github.com/cloudfoundry/cloud_controller_ng/blob/5c4dac049bd28979284aeab1efa48c7075676131/lib/cloud_controller/drain.rb#L9)
* Find a way to sync draining between jobs without requiring
  co-location.
* Keep the co-location requirement for draining but behind a capi-property.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants