App state is not updating in cf cli and appsman ui when the app is down #4309
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In some circumstances CAPI will report an app instance is running when it is down.
CAPI iterates over all actual_lrps returned from Diego and uses the app index as the key, so in the case CAPI will override each app instance information once and the state shown will be determined by the order of the actual lrp instances.
Example of a duplicate entry from cfdot actual-lrps. Note the process_guid and index are the same.
{
"process_guid": "57a8e43b-81f9-46e9-9f78-81e15bbfd231-de7f7844-156e-4fc7-9f21-db5d072fb0b7",
"index": 3,
"domain": "cf-apps",
"instance_guid": "",
"cell_id": "",
"address": "",
"ports": null,
"preferred_address": "UNKNOWN",
"crash_count": 0,
"state": "UNCLAIMED",
"placement_error": "unable to communicate to compatible cells",
"since": 1739568280529021112,
"modification_tag": {
"epoch": "780635af-9208-4d5e-5a08-ea49ebcb3f95",
"index": 5758
},
"presence": "ORDINARY",
"OptionalRoutable": {
"routable": false
},
"availability_zone": ""
}
{
"process_guid": "57a8e43b-81f9-46e9-9f78-81e15bbfd231-de7f7844-156e-4fc7-9f21-db5d072fb0b7",
"index": 3,
"domain": "cf-apps",
"instance_guid": "1f3ffac3-be77-45e0-5075-7357",
"cell_id": "23b06662-20e7-42dd-9377-6d8f10190ec4",
"address": "10.0.4.17",
"ports": [
{
"container_port": 8080,
"host_port": 61012,
"container_tls_proxy_port": 61001,
"host_tls_proxy_port": 61014
},
{
"container_port": 8080,
"host_port": 61012,
"container_tls_proxy_port": 61443,
"host_tls_proxy_port": 0
},
{
"container_port": 2222,
"host_port": 61013,
"container_tls_proxy_port": 61002,
"host_tls_proxy_port": 61015
}
],
"instance_address": "10.255.233.24",
"preferred_address": "HOST",
"crash_count": 0,
"state": "RUNNING",
"since": 1739222044495241579,
"modification_tag": {
"epoch": "4a424a13-b5ba-47b7-771a-1a61d99c2524",
"index": 2
},
"presence": "SUSPECT",
"metric_tags": {
"app_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
"app_name": "static",
"instance_id": "3",
"organization_id": "c877a084-d65b-4758-9908-90201c6df339",
"organization_name": "org-1",
"process_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
"process_instance_id": "1f3ffac3-be77-45e0-5075-7357",
"process_type": "web",
"source_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
"space_id": "b248d5ab-2948-468b-ad0f-7b1b90e923d1",
"space_name": "space-1"
},
"OptionalRoutable": {
"routable": true
},
"availability_zone": "us-central1-f"
}
Fix
In the case of duplicates, CAPI should look at the since value of the actual_lrp information and take the latest definition.
Tested by killing the diego cell VM bosh delete-vm. Now cf app returns the correct status
instances: 0/2
memory usage: 1024M
state since cpu memory disk logging cpu entitlement details
#0 down 2025-04-15T00:34:03Z 0.0% 0B of 0B 0B of 0B 0B/s of 0B/s unable to communicate to compatible cells
#1 down 2025-04-15T00:34:03Z 0.0% 0B of 0B 0B of 0B 0B/s of 0B/s unable to communicate to compatible cells
I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the
main
branchI have run all the unit tests using
bundle exec rake
I have run CF Acceptance Tests