Skip to content

Conversation

nookala
Copy link
Contributor

@nookala nookala commented Apr 13, 2025

In some circumstances CAPI will report an app instance is running when it is down.

CAPI iterates over all actual_lrps returned from Diego and uses the app index as the key, so in the case CAPI will override each app instance information once and the state shown will be determined by the order of the actual lrp instances.

Example of a duplicate entry from cfdot actual-lrps. Note the process_guid and index are the same.

{
"process_guid": "57a8e43b-81f9-46e9-9f78-81e15bbfd231-de7f7844-156e-4fc7-9f21-db5d072fb0b7",
"index": 3,
"domain": "cf-apps",
"instance_guid": "",
"cell_id": "",
"address": "",
"ports": null,
"preferred_address": "UNKNOWN",
"crash_count": 0,
"state": "UNCLAIMED",
"placement_error": "unable to communicate to compatible cells",
"since": 1739568280529021112,
"modification_tag": {
"epoch": "780635af-9208-4d5e-5a08-ea49ebcb3f95",
"index": 5758
},
"presence": "ORDINARY",
"OptionalRoutable": {
"routable": false
},
"availability_zone": ""
}
{
"process_guid": "57a8e43b-81f9-46e9-9f78-81e15bbfd231-de7f7844-156e-4fc7-9f21-db5d072fb0b7",
"index": 3,
"domain": "cf-apps",
"instance_guid": "1f3ffac3-be77-45e0-5075-7357",
"cell_id": "23b06662-20e7-42dd-9377-6d8f10190ec4",
"address": "10.0.4.17",
"ports": [
{
"container_port": 8080,
"host_port": 61012,
"container_tls_proxy_port": 61001,
"host_tls_proxy_port": 61014
},
{
"container_port": 8080,
"host_port": 61012,
"container_tls_proxy_port": 61443,
"host_tls_proxy_port": 0
},
{
"container_port": 2222,
"host_port": 61013,
"container_tls_proxy_port": 61002,
"host_tls_proxy_port": 61015
}
],
"instance_address": "10.255.233.24",
"preferred_address": "HOST",
"crash_count": 0,
"state": "RUNNING",
"since": 1739222044495241579,
"modification_tag": {
"epoch": "4a424a13-b5ba-47b7-771a-1a61d99c2524",
"index": 2
},
"presence": "SUSPECT",
"metric_tags": {
"app_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
"app_name": "static",
"instance_id": "3",
"organization_id": "c877a084-d65b-4758-9908-90201c6df339",
"organization_name": "org-1",
"process_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
"process_instance_id": "1f3ffac3-be77-45e0-5075-7357",
"process_type": "web",
"source_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
"space_id": "b248d5ab-2948-468b-ad0f-7b1b90e923d1",
"space_name": "space-1"
},
"OptionalRoutable": {
"routable": true
},
"availability_zone": "us-central1-f"
}
Fix
In the case of duplicates, CAPI should look at the since value of the actual_lrp information and take the latest definition.
Tested by killing the diego cell VM bosh delete-vm. Now cf app returns the correct status

instances: 0/2
memory usage: 1024M
state since cpu memory disk logging cpu entitlement details
#0 down 2025-04-15T00:34:03Z 0.0% 0B of 0B 0B of 0B 0B/s of 0B/s unable to communicate to compatible cells
#1 down 2025-04-15T00:34:03Z 0.0% 0B of 0B 0B of 0B 0B/s of 0B/s unable to communicate to compatible cells

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

@nookala nookala marked this pull request as ready for review April 15, 2025 14:36
@Samze Samze self-requested a review April 15, 2025 14:57
@Samze Samze merged commit 53e13f7 into cloudfoundry:main Apr 16, 2025
8 checks passed
ari-wg-gitbot added a commit to cloudfoundry/capi-release that referenced this pull request Apr 16, 2025
Changes in cloud_controller_ng:

- App state is not updating in cf cli and appsman ui when the app is down
    PR: cloudfoundry/cloud_controller_ng#4309
    Author: Sriram Nookala <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants