Skip to content

Conversation

@gyakami
Copy link
Contributor

@gyakami gyakami commented Aug 1, 2025

Description

Overview

This PR adds a feature to automatically delete related assume_role assertions in tenant domains when a provider role is deleted.
#2910

Changes

  1. Domain Model Extension

    • Added an autoDeleteTenantAssumeRoleAssertions flag to all domain types (Domain, DomainData, DomainMeta, SubDomain, TopLevelDomain, UserDomain).
    • The default value is false (opt-in).
  2. Database Schema Update

    • Added an auto_delete_tenant_assume_role_assertions column (TINYINT(1)) to the domain table.
    • Added the update script update-20250711.sql.
  3. JDBCConnection Class Extension

    • Added the deleteAssumeRoleAssertions method to find and delete assume_role assertions in a tenant domain that reference a specific provider role.
  4. DBService Class Extension

    • Added the executeDeleteAssumeRoleAssertions methods.
  5. ZMSImpl Class Extension

    • Added an autoDeleteTenantAssumeRoleAssertions flag (loaded from the configuration file).
    • Extended the deleteRole method to also delete tenant assume_role assertions when the conditions are met.
  6. Configuration File Update

    • Added the athenz.zms.auto_delete_tenant_assume_role_assertions setting to zms.properties (default is false).

Behavior Details

  1. When a provider role is deleted, the system checks for tenant domains that trust this role.
  2. Automatic cleanup is performed if all of the following conditions are met:
    • The autoDeleteTenantAssumeRoleAssertions flag is set to true in the server configuration.
    • The autoDeleteTenantAssumeRoleAssertions flag is set to true for the tenant domain.
  3. The system searches for assume_role assertions within the tenant domain that reference the provider role being deleted.
  4. It deletes the corresponding assertions and updates the modification timestamp of the related policies.
  5. The delete operation is recorded in the audit log.

Tests

  • Added unit tests (JDBCConnectionTest, DBServiceTest, ZMSImplTest).
  • Covers both success and failure cases.
  • Includes tests for various combinations of settings (server setting ON/OFF, domain setting ON/OFF).

Impact

This change is opt-in and does not affect existing behavior. To use this new feature, both the server and the domain settings must be enabled.

Contribution Checklist:

  • The pull request does not introduce any breaking changes
  • I have read the contribution guidelines.
  • Create an issue and link to the pull request.

Attach Screenshots (Optional)

@havetisyan
Copy link
Collaborator

@gyakami thanks for the PR. a couple of general comments before I start looking at your PR.
a) please run mvn clean install from the top level athenz directory to make sure all test requirements pass. I see there are no test cases for the new attribute in the zms-core module so that would fail since that module has 100% coverage.
b) for the DB update, please follow the instructions in the repo on how to update the schema: https://github.com/AthenZ/athenz/blob/master/servers/zms/schema/schema_update.md

@gyakami
Copy link
Contributor Author

gyakami commented Aug 8, 2025

@havetisyan
Thanks for your feedback.
I've addressed the two points you raised:

a) I've added test cases for the new attribute in the zms-core module and confirmed that mvn clean install now passes.
b) I've updated the DB schema by following the instructions provided in the repository.

Could you please take another look when you have a moment? Thank you.


// Automatically purge tenant‑side assume_role assertions that reference the role we are about to delete.
if (autoDeleteTenantAssumeRoleAssertions) {
Domain domain = dbService.getDomain(role.trust, useMasterCopyForSignedDomains);
Copy link
Collaborator

@havetisyan havetisyan Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call will throw an exception and the delete role request will always be rejected if the role does not have a trust set.

Please add a test case where the setting is enabled but we're just deleting a regular role which would fail with the above code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the logic to only delete the role if the trust domain is empty and added a test case for it.

if (autoDeleteTenantAssumeRoleAssertions) {
Domain domain = dbService.getDomain(role.trust, useMasterCopyForSignedDomains);
// Only proceed if the tenant domain explicitly opts‑in to the cleanup.
if (domain.autoDeleteTenantAssumeRoleAssertions) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we typically don't access the fields directly (at some point we might move away from rdl and those may not be public) so please use the appropriate get method to get the attribute value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified it to use the get method.

continue;
}

for (Assertion assertion : deletedAssertions) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason why we're creating an audit log entry per assertion instead of just one per policy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no specific reason. I've updated the code to group audit logs by policy.

}
}

void executeDeleteRoleWithAssumeRoleAssertions(ResourceContext ctx, String domainName, String roleName,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see references to this function in the test files. Where is this used in the server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, my mistake. I've deleted executeDeleteRoleWithAssumeRoleAssertions as we're now only using executeDeleteAssumeRoleAssertions.

@@ -1,5 +1,5 @@
-- MySQL Script generated by MySQL Workbench
-- Tue Jun 3 17:51:01 2025
-- Fri Aug 8 14:47:06 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR is missing the updated workbench file. Please include that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the updated workbench file!

con.updateDomainModTimestamp(domainName);
cacheStore.invalidate(domainName);

for (Policy policy : policies) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the api call, the last modification timestamp for the policy was updated. However, the domain last modified timestamp must be updated as well. So when we get the data here we need to create a set to include all the domains were affected and then for every domain:

a) update the last modification timestamp
b) invalidate the domain from the local cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it only the tenant domain that the Delegated Role trusts (the domainName in this method) that's affected?


try (PreparedStatement ps = con.prepareStatement(SQL_SELECT_TENANT_ASSUME_ROLE_POLICY_IDS)) {
ps.setString(1, domainName);
ps.setString(2, providerDomainName + ":role." + providerRoleName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use: ResourceUtils.roleResourceName(providerDomainName, providerRoleName)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed it!

// 2) Remove the matching assume_role assertions themselves.
try (PreparedStatement ps = con.prepareStatement(SQL_DELETE_TENANT_ASSUME_ROLE_ASSERTIONS)) {
ps.setString(1, domainName);
ps.setString(2, providerDomainName + ":role." + providerRoleName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use: ResourceUtils.roleResourceName(providerDomainName, providerRoleName), and for that matter instead of calling the api twice maybe create a final string at the beginning of the method and then use it in both places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed it!

@gyakami
Copy link
Contributor Author

gyakami commented Aug 28, 2025

@havetisyan
My apologies for the delay. I have replied to all the comments and/or revised the code. Please take a look when you have a moment.

@havetisyan
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an automatic cleanup feature for delegated roles, which is a valuable addition. The changes are extensive, touching models, database schema, and service logic. Overall, the implementation is well-structured. However, I've identified a critical bug in the database access logic that would lead to incorrect audit trails, and a broken unit test. My review includes a detailed fix for the critical bug and points out the problematic test case. Addressing these points will ensure the feature is robust and reliable.

Comment on lines 3373 to 3433
public List<Policy> deleteAssumeRoleAssertions(String domainName,
String providerDomainName, String providerRoleName) throws ServerResourceException {

final String caller = "deleteAssumeRoleAssertion";
final String providerRoleResource =
ResourceUtils.roleResourceName(providerDomainName, providerRoleName);

// 1) Locate every policy in the *tenant* domain that contains an
// `assume_role` assertion pointing to the soon‑to‑be‑deleted provider
// role ({providerDomain}:role.{providerRole}).
Set<Integer> policyIds = new HashSet<>();
List<Policy> policies = new ArrayList<>();

try (PreparedStatement ps = con.prepareStatement(SQL_SELECT_TENANT_ASSUME_ROLE_POLICY_IDS)) {
ps.setString(1, domainName);
ps.setString(2, providerRoleResource);
try (ResultSet rs = executeQuery(ps, caller)) {
while (rs.next()) {
int policyId = rs.getInt(JDBCConsts.DB_COLUMN_POLICY_ID);
policyIds.add(policyId);

// Capture the *pre‑delete* state of each policy – required for
// audit logging later in the call chain.
policies.add(savePolicySettings(domainName,
rs.getString(JDBCConsts.DB_COLUMN_NAME), rs));
}
}
} catch (SQLException ex) {
throw sqlError(ex, caller);
}

// Nothing to do if no matching policies were found.
if (policyIds.isEmpty()) {
return policies;
}

// 2) Remove the matching assume_role assertions themselves.
try (PreparedStatement ps = con.prepareStatement(SQL_DELETE_TENANT_ASSUME_ROLE_ASSERTIONS)) {
ps.setString(1, domainName);
ps.setString(2, providerRoleResource);
executeUpdate(ps, caller);
} catch (SQLException ex) {
throw sqlError(ex, caller);
}

// 3) For every policy that lost an assertion, bump its modified
// timestamp so that caches and signed‑domain consumers notice the
// change.
try (PreparedStatement ps = con.prepareStatement(SQL_UPDATE_POLICY_MOD_TIMESTAMP)) {
for (Integer policyId : policyIds) {
ps.setInt(1, policyId);
executeUpdate(ps, caller);
}
} catch (SQLException ex) {
throw sqlError(ex, caller);
}

// Return the snapshot of policies that existed *before* we removed the
// assertions – the caller uses this for auditing and notifications.
return policies;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a critical issue in this method that will lead to incorrect audit logs. The method currently calls savePolicySettings, which in turn calls listAssertions to fetch all assertions for a policy. This means that when a policy has other assertions besides the assume_role one being deleted, the returned Policy object will contain all of them. The DBService layer then incorrectly logs all of these assertions as deleted.

Additionally, the method updates policy modification timestamps in a loop, which is inefficient as it results in multiple database round-trips.

I've provided a complete replacement for this method below that fixes the auditing bug by constructing the list of deleted assertions directly from the SELECT query's result set. It also improves performance by using batch updates for the timestamps.

Please also update the corresponding test testDeleteAssumeRoleAssertions in JDBCConnectionTest.java to verify batch execution instead of multiple executeUpdate calls.

    public List<Policy> deleteAssumeRoleAssertions(String domainName,
            String providerDomainName, String providerRoleName) throws ServerResourceException {

        final String caller = "deleteAssumeRoleAssertions";
        final String providerRoleResource =
                ResourceUtils.roleResourceName(providerDomainName, providerRoleName);

        // 1) Locate every policy in the *tenant* domain that contains an
        //    `assume_role` assertion pointing to the soon‑to‑be‑deleted provider
        //    role ({providerDomain}:role.{providerRole}).
        Map<Integer, Policy> affectedPolicies = new LinkedHashMap<>();
        try (PreparedStatement ps = con.prepareStatement(SQL_SELECT_TENANT_ASSUME_ROLE_POLICY_IDS)) {
            ps.setString(1, domainName);
            ps.setString(2, providerRoleResource);
            try (ResultSet rs = executeQuery(ps, caller)) {
                while (rs.next()) {
                    int policyId = rs.getInt(JDBCConsts.DB_COLUMN_POLICY_ID);
                    Policy policy = affectedPolicies.get(policyId);
                    if (policy == null) {
                        policy = new Policy();
                        policy.setName(rs.getString(JDBCConsts.DB_COLUMN_NAME));
                        policy.setDomain(domainName);
                        policy.setModified(rs.getTimestamp(JDBCConsts.DB_COLUMN_MODIFIED));
                        policy.setVersion(rs.getString(JDBCConsts.DB_COLUMN_VERSION));
                        policy.setActive(rs.getBoolean(JDBCConsts.DB_COLUMN_ACTIVE));
                        policy.setCaseSensitive(rs.getBoolean(JDBCConsts.DB_COLUMN_CASE_SENSITIVE));
                        policy.setAssertions(new ArrayList<>());
                        affectedPolicies.put(policyId, policy);
                    }

                    Assertion assertion = new Assertion();
                    assertion.setId(rs.getLong(JDBCConsts.DB_COLUMN_ASSERTION_ID));
                    assertion.setRole(rs.getString(JDBCConsts.DB_COLUMN_ROLE));
                    assertion.setResource(rs.getString(JDBCConsts.DB_COLUMN_RESOURCE));
                    assertion.setAction(rs.getString(JDBCConsts.DB_COLUMN_ACTION));
                    assertion.setEffect(rs.getString(JDBCConsts.DB_COLUMN_EFFECT));
                    policy.getAssertions().add(assertion);
                }
            }
        } catch (SQLException ex) {
            throw sqlError(ex, caller);
        }

        List<Policy> policies = new ArrayList<>(affectedPolicies.values());
        // Nothing to do if no matching policies were found.
        if (policies.isEmpty()) {
            return policies;
        }

        // 2) Remove the matching assume_role assertions themselves.
        try (PreparedStatement ps = con.prepareStatement(SQL_DELETE_TENANT_ASSUME_ROLE_ASSERTIONS)) {
            ps.setString(1, domainName);
            ps.setString(2, providerRoleResource);
            executeUpdate(ps, caller);
        } catch (SQLException ex) {
            throw sqlError(ex, caller);
        }

        // 3) For every policy that lost an assertion, bump its modified
        //    timestamp so that caches and signed‑domain consumers notice the
        //    change.
        try (PreparedStatement ps = con.prepareStatement(SQL_UPDATE_POLICY_MOD_TIMESTAMP)) {
            for (Integer policyId : affectedPolicies.keySet()) {
                ps.setInt(1, policyId);
                ps.addBatch();
            }
            if (!affectedPolicies.isEmpty()) {
                ps.executeBatch();
            }
        } catch (SQLException ex) {
            throw sqlError(ex, caller);
        }

        // Return the snapshot of policies that existed *before* we removed the
        // assertions – the caller uses this for auditing and notifications.
        return policies;
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a critical issue in this method that will lead to incorrect audit logs. The method currently calls savePolicySettings, which in turn calls listAssertions to fetch all assertions for a policy. This means that when a policy has other assertions besides the assume_role one being deleted, the returned Policy object will contain all of them. The DBService layer then incorrectly logs all of these assertions as deleted.

This comment is incorrect. The savePolicySettings() method only sets policy metadata from the ResultSet; it does not fetch assertions, nor does it call listAssertions. Therefore, This problem will not occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, the method updates policy modification timestamps in a loop, which is inefficient as it results in multiple database round-trips.

I've updated the code to use a single query now.

Comment on lines 1521 to 1592
public void testExecuteDeleteRoleWithAssumeRoleAssertionsNoAssertionsFound()
throws ServerResourceException {

final String domainName = "delrolenoassert";
final String roleName = "role1";
final String providerDomain = "coretech";
final String providerRoleName = "sync";

// We only need the domain object for audit‑enabled check
Domain domain = new Domain().setAuditEnabled(false);
Mockito.when(mockJdbcConn.getDomain(domainName)).thenReturn(domain);

// Case 1: DAO returns null
Mockito.when(mockJdbcConn.deleteAssumeRoleAssertions(
eq(domainName), eq(providerDomain), eq(providerRoleName)))
.thenReturn(null);

ObjectStore saveStore = zms.dbService.store;
zms.dbService.store = mockObjStore;

int saveRetryCount = zms.dbService.defaultRetryCount;
zms.dbService.defaultRetryCount = 1; // one retry is enough

try {
zms.dbService.executeDeleteRoleWithAssumeRoleAssertions(
mockDomRsrcCtx,
domainName,
roleName,
providerDomain,
providerRoleName,
auditRef,
"deleteRoleWithAssumeRoleAssertions");
fail();
} catch (ResourceException ex) {
assertEquals(ResourceException.NOT_FOUND, ex.getCode());
assertTrue(ex.getMessage().contains(
"unable to delete assume role assertions"));
} finally {
zms.dbService.defaultRetryCount = saveRetryCount;
zms.dbService.store = saveStore;
}

// Case 2: DAO returns an empty list – expected behavior is identical
Mockito.when(mockJdbcConn.deleteAssumeRoleAssertions(
eq(domainName), eq(providerDomain), eq(providerRoleName)))
.thenReturn(Collections.emptyList());

saveStore = zms.dbService.store;
zms.dbService.store = mockObjStore;

saveRetryCount = zms.dbService.defaultRetryCount;
zms.dbService.defaultRetryCount = 1;

try {
zms.dbService.executeDeleteRoleWithAssumeRoleAssertions(
mockDomRsrcCtx,
domainName,
roleName,
providerDomain,
providerRoleName,
auditRef,
"deleteRoleWithAssumeRoleAssertions");
fail();
} catch (ResourceException ex) {
assertEquals(ResourceException.NOT_FOUND, ex.getCode());
assertTrue(ex.getMessage().contains(
"unable to delete assume role assertions"));
} finally {
zms.dbService.defaultRetryCount = saveRetryCount;
zms.dbService.store = saveStore;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test case appears to be broken. It's attempting to call a method executeDeleteRoleWithAssumeRoleAssertions on zms.dbService, but this method does not exist in DBService.

Furthermore, the test expects a ResourceException.NOT_FOUND when no assertions are found. However, the new executeDeleteAssumeRoleAssertions method is designed to simply return without throwing an exception in this scenario, allowing the provider role deletion to proceed.

This test should either be removed or corrected to test the actual behavior of executeDeleteAssumeRoleAssertions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I have deleted the test.

@gyakami
Copy link
Contributor Author

gyakami commented Sep 2, 2025

@havetisyan
I’ve also addressed the Gemini Code Review comments, so please take a look when you have a moment.

@gyakami
Copy link
Contributor Author

gyakami commented Sep 10, 2025

@havetisyan
When I look at the GitHub Actions logs, the following tests for athenz-zms-server are failing. However, the tests succeed on my Mac. It also seems that the number of tests is different. I'm stuck because I don't know the reason for this. Do you have any ideas what might be causing this?

PutGroupMembershipNotificationTaskTest.testGenerateAndSendPostPutGroupMembershipNotificationNotifyGroups
PutGroupMembershipNotificationTaskTest.testGenerateAndSendPostPutGroupMembershipNotificationNullOrgGroup

GitHub Actions
[ERROR] Tests run: 1589, Failures: 2, Errors: 0, Skipped: 0
Mac
[INFO] Tests run: 1588, Failures: 0, Errors: 0, Skipped: 0

@havetisyan havetisyan merged commit 0b4e7b0 into AthenZ:master Sep 14, 2025
4 of 5 checks passed
chandrasekhar1996 pushed a commit to chandrasekhar1996/terraform-provider-athenz that referenced this pull request Sep 30, 2025
chandrasekhar1996 pushed a commit to chandrasekhar1996/terraform-provider-athenz that referenced this pull request Sep 30, 2025
havetisyan pushed a commit to AthenZ/terraform-provider-athenz that referenced this pull request Oct 6, 2025
* Add self-serve group members resource

Implements athenz_self_serve_group_members resource to manage specific
group members without affecting externally managed ones. Follows the
same pattern as self-serve role members but adapted for groups.

Signed-off-by: Chandu Raman <[email protected]>

* update system test result to include new field autoDeleteTenantAssumeRoleAssertions (AthenZ/athenz#3027)

Signed-off-by: Chandu Raman <[email protected]>

---------

Signed-off-by: Chandu Raman <[email protected]>
Co-authored-by: Chandu Raman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants