-
Notifications
You must be signed in to change notification settings - Fork 296
Add an automatic cleanup feature for delegated roles. #3027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an automatic cleanup feature for delegated roles. #3027
Conversation
Signed-off-by: Go Yakami <[email protected]>
…ertions Signed-off-by: Go Yakami <[email protected]>
|
@gyakami thanks for the PR. a couple of general comments before I start looking at your PR. |
Signed-off-by: Go Yakami <[email protected]>
…assertions Signed-off-by: Go Yakami <[email protected]>
…ssertions Signed-off-by: Go Yakami <[email protected]>
…s when no assertions found Signed-off-by: Go Yakami <[email protected]>
|
@havetisyan a) I've added test cases for the new attribute in the zms-core module and confirmed that mvn clean install now passes. Could you please take another look when you have a moment? Thank you. |
|
|
||
| // Automatically purge tenant‑side assume_role assertions that reference the role we are about to delete. | ||
| if (autoDeleteTenantAssumeRoleAssertions) { | ||
| Domain domain = dbService.getDomain(role.trust, useMasterCopyForSignedDomains); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call will throw an exception and the delete role request will always be rejected if the role does not have a trust set.
Please add a test case where the setting is enabled but we're just deleting a regular role which would fail with the above code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the logic to only delete the role if the trust domain is empty and added a test case for it.
| if (autoDeleteTenantAssumeRoleAssertions) { | ||
| Domain domain = dbService.getDomain(role.trust, useMasterCopyForSignedDomains); | ||
| // Only proceed if the tenant domain explicitly opts‑in to the cleanup. | ||
| if (domain.autoDeleteTenantAssumeRoleAssertions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we typically don't access the fields directly (at some point we might move away from rdl and those may not be public) so please use the appropriate get method to get the attribute value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified it to use the get method.
| continue; | ||
| } | ||
|
|
||
| for (Assertion assertion : deletedAssertions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason why we're creating an audit log entry per assertion instead of just one per policy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no specific reason. I've updated the code to group audit logs by policy.
| } | ||
| } | ||
|
|
||
| void executeDeleteRoleWithAssumeRoleAssertions(ResourceContext ctx, String domainName, String roleName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only see references to this function in the test files. Where is this used in the server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, my mistake. I've deleted executeDeleteRoleWithAssumeRoleAssertions as we're now only using executeDeleteAssumeRoleAssertions.
| @@ -1,5 +1,5 @@ | |||
| -- MySQL Script generated by MySQL Workbench | |||
| -- Tue Jun 3 17:51:01 2025 | |||
| -- Fri Aug 8 14:47:06 2025 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the PR is missing the updated workbench file. Please include that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the updated workbench file!
| con.updateDomainModTimestamp(domainName); | ||
| cacheStore.invalidate(domainName); | ||
|
|
||
| for (Policy policy : policies) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the api call, the last modification timestamp for the policy was updated. However, the domain last modified timestamp must be updated as well. So when we get the data here we need to create a set to include all the domains were affected and then for every domain:
a) update the last modification timestamp
b) invalidate the domain from the local cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it only the tenant domain that the Delegated Role trusts (the domainName in this method) that's affected?
|
|
||
| try (PreparedStatement ps = con.prepareStatement(SQL_SELECT_TENANT_ASSUME_ROLE_POLICY_IDS)) { | ||
| ps.setString(1, domainName); | ||
| ps.setString(2, providerDomainName + ":role." + providerRoleName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use: ResourceUtils.roleResourceName(providerDomainName, providerRoleName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed it!
| // 2) Remove the matching assume_role assertions themselves. | ||
| try (PreparedStatement ps = con.prepareStatement(SQL_DELETE_TENANT_ASSUME_ROLE_ASSERTIONS)) { | ||
| ps.setString(1, domainName); | ||
| ps.setString(2, providerDomainName + ":role." + providerRoleName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use: ResourceUtils.roleResourceName(providerDomainName, providerRoleName), and for that matter instead of calling the api twice maybe create a final string at the beginning of the method and then use it in both places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed it!
Signed-off-by: Go Yakami <[email protected]>
…eteAssumeRoleAssertion Signed-off-by: Go Yakami <[email protected]>
Signed-off-by: Go Yakami <[email protected]>
…nd related tests Signed-off-by: Go Yakami <[email protected]>
Signed-off-by: Go Yakami <[email protected]>
…egated_roles/20250801 Signed-off-by: gyakami <[email protected]>
|
@havetisyan |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an automatic cleanup feature for delegated roles, which is a valuable addition. The changes are extensive, touching models, database schema, and service logic. Overall, the implementation is well-structured. However, I've identified a critical bug in the database access logic that would lead to incorrect audit trails, and a broken unit test. My review includes a detailed fix for the critical bug and points out the problematic test case. Addressing these points will ensure the feature is robust and reliable.
| public List<Policy> deleteAssumeRoleAssertions(String domainName, | ||
| String providerDomainName, String providerRoleName) throws ServerResourceException { | ||
|
|
||
| final String caller = "deleteAssumeRoleAssertion"; | ||
| final String providerRoleResource = | ||
| ResourceUtils.roleResourceName(providerDomainName, providerRoleName); | ||
|
|
||
| // 1) Locate every policy in the *tenant* domain that contains an | ||
| // `assume_role` assertion pointing to the soon‑to‑be‑deleted provider | ||
| // role ({providerDomain}:role.{providerRole}). | ||
| Set<Integer> policyIds = new HashSet<>(); | ||
| List<Policy> policies = new ArrayList<>(); | ||
|
|
||
| try (PreparedStatement ps = con.prepareStatement(SQL_SELECT_TENANT_ASSUME_ROLE_POLICY_IDS)) { | ||
| ps.setString(1, domainName); | ||
| ps.setString(2, providerRoleResource); | ||
| try (ResultSet rs = executeQuery(ps, caller)) { | ||
| while (rs.next()) { | ||
| int policyId = rs.getInt(JDBCConsts.DB_COLUMN_POLICY_ID); | ||
| policyIds.add(policyId); | ||
|
|
||
| // Capture the *pre‑delete* state of each policy – required for | ||
| // audit logging later in the call chain. | ||
| policies.add(savePolicySettings(domainName, | ||
| rs.getString(JDBCConsts.DB_COLUMN_NAME), rs)); | ||
| } | ||
| } | ||
| } catch (SQLException ex) { | ||
| throw sqlError(ex, caller); | ||
| } | ||
|
|
||
| // Nothing to do if no matching policies were found. | ||
| if (policyIds.isEmpty()) { | ||
| return policies; | ||
| } | ||
|
|
||
| // 2) Remove the matching assume_role assertions themselves. | ||
| try (PreparedStatement ps = con.prepareStatement(SQL_DELETE_TENANT_ASSUME_ROLE_ASSERTIONS)) { | ||
| ps.setString(1, domainName); | ||
| ps.setString(2, providerRoleResource); | ||
| executeUpdate(ps, caller); | ||
| } catch (SQLException ex) { | ||
| throw sqlError(ex, caller); | ||
| } | ||
|
|
||
| // 3) For every policy that lost an assertion, bump its modified | ||
| // timestamp so that caches and signed‑domain consumers notice the | ||
| // change. | ||
| try (PreparedStatement ps = con.prepareStatement(SQL_UPDATE_POLICY_MOD_TIMESTAMP)) { | ||
| for (Integer policyId : policyIds) { | ||
| ps.setInt(1, policyId); | ||
| executeUpdate(ps, caller); | ||
| } | ||
| } catch (SQLException ex) { | ||
| throw sqlError(ex, caller); | ||
| } | ||
|
|
||
| // Return the snapshot of policies that existed *before* we removed the | ||
| // assertions – the caller uses this for auditing and notifications. | ||
| return policies; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a critical issue in this method that will lead to incorrect audit logs. The method currently calls savePolicySettings, which in turn calls listAssertions to fetch all assertions for a policy. This means that when a policy has other assertions besides the assume_role one being deleted, the returned Policy object will contain all of them. The DBService layer then incorrectly logs all of these assertions as deleted.
Additionally, the method updates policy modification timestamps in a loop, which is inefficient as it results in multiple database round-trips.
I've provided a complete replacement for this method below that fixes the auditing bug by constructing the list of deleted assertions directly from the SELECT query's result set. It also improves performance by using batch updates for the timestamps.
Please also update the corresponding test testDeleteAssumeRoleAssertions in JDBCConnectionTest.java to verify batch execution instead of multiple executeUpdate calls.
public List<Policy> deleteAssumeRoleAssertions(String domainName,
String providerDomainName, String providerRoleName) throws ServerResourceException {
final String caller = "deleteAssumeRoleAssertions";
final String providerRoleResource =
ResourceUtils.roleResourceName(providerDomainName, providerRoleName);
// 1) Locate every policy in the *tenant* domain that contains an
// `assume_role` assertion pointing to the soon‑to‑be‑deleted provider
// role ({providerDomain}:role.{providerRole}).
Map<Integer, Policy> affectedPolicies = new LinkedHashMap<>();
try (PreparedStatement ps = con.prepareStatement(SQL_SELECT_TENANT_ASSUME_ROLE_POLICY_IDS)) {
ps.setString(1, domainName);
ps.setString(2, providerRoleResource);
try (ResultSet rs = executeQuery(ps, caller)) {
while (rs.next()) {
int policyId = rs.getInt(JDBCConsts.DB_COLUMN_POLICY_ID);
Policy policy = affectedPolicies.get(policyId);
if (policy == null) {
policy = new Policy();
policy.setName(rs.getString(JDBCConsts.DB_COLUMN_NAME));
policy.setDomain(domainName);
policy.setModified(rs.getTimestamp(JDBCConsts.DB_COLUMN_MODIFIED));
policy.setVersion(rs.getString(JDBCConsts.DB_COLUMN_VERSION));
policy.setActive(rs.getBoolean(JDBCConsts.DB_COLUMN_ACTIVE));
policy.setCaseSensitive(rs.getBoolean(JDBCConsts.DB_COLUMN_CASE_SENSITIVE));
policy.setAssertions(new ArrayList<>());
affectedPolicies.put(policyId, policy);
}
Assertion assertion = new Assertion();
assertion.setId(rs.getLong(JDBCConsts.DB_COLUMN_ASSERTION_ID));
assertion.setRole(rs.getString(JDBCConsts.DB_COLUMN_ROLE));
assertion.setResource(rs.getString(JDBCConsts.DB_COLUMN_RESOURCE));
assertion.setAction(rs.getString(JDBCConsts.DB_COLUMN_ACTION));
assertion.setEffect(rs.getString(JDBCConsts.DB_COLUMN_EFFECT));
policy.getAssertions().add(assertion);
}
}
} catch (SQLException ex) {
throw sqlError(ex, caller);
}
List<Policy> policies = new ArrayList<>(affectedPolicies.values());
// Nothing to do if no matching policies were found.
if (policies.isEmpty()) {
return policies;
}
// 2) Remove the matching assume_role assertions themselves.
try (PreparedStatement ps = con.prepareStatement(SQL_DELETE_TENANT_ASSUME_ROLE_ASSERTIONS)) {
ps.setString(1, domainName);
ps.setString(2, providerRoleResource);
executeUpdate(ps, caller);
} catch (SQLException ex) {
throw sqlError(ex, caller);
}
// 3) For every policy that lost an assertion, bump its modified
// timestamp so that caches and signed‑domain consumers notice the
// change.
try (PreparedStatement ps = con.prepareStatement(SQL_UPDATE_POLICY_MOD_TIMESTAMP)) {
for (Integer policyId : affectedPolicies.keySet()) {
ps.setInt(1, policyId);
ps.addBatch();
}
if (!affectedPolicies.isEmpty()) {
ps.executeBatch();
}
} catch (SQLException ex) {
throw sqlError(ex, caller);
}
// Return the snapshot of policies that existed *before* we removed the
// assertions – the caller uses this for auditing and notifications.
return policies;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a critical issue in this method that will lead to incorrect audit logs. The method currently calls savePolicySettings, which in turn calls listAssertions to fetch all assertions for a policy. This means that when a policy has other assertions besides the assume_role one being deleted, the returned Policy object will contain all of them. The DBService layer then incorrectly logs all of these assertions as deleted.
This comment is incorrect. The savePolicySettings() method only sets policy metadata from the ResultSet; it does not fetch assertions, nor does it call listAssertions. Therefore, This problem will not occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, the method updates policy modification timestamps in a loop, which is inefficient as it results in multiple database round-trips.
I've updated the code to use a single query now.
| public void testExecuteDeleteRoleWithAssumeRoleAssertionsNoAssertionsFound() | ||
| throws ServerResourceException { | ||
|
|
||
| final String domainName = "delrolenoassert"; | ||
| final String roleName = "role1"; | ||
| final String providerDomain = "coretech"; | ||
| final String providerRoleName = "sync"; | ||
|
|
||
| // We only need the domain object for audit‑enabled check | ||
| Domain domain = new Domain().setAuditEnabled(false); | ||
| Mockito.when(mockJdbcConn.getDomain(domainName)).thenReturn(domain); | ||
|
|
||
| // Case 1: DAO returns null | ||
| Mockito.when(mockJdbcConn.deleteAssumeRoleAssertions( | ||
| eq(domainName), eq(providerDomain), eq(providerRoleName))) | ||
| .thenReturn(null); | ||
|
|
||
| ObjectStore saveStore = zms.dbService.store; | ||
| zms.dbService.store = mockObjStore; | ||
|
|
||
| int saveRetryCount = zms.dbService.defaultRetryCount; | ||
| zms.dbService.defaultRetryCount = 1; // one retry is enough | ||
|
|
||
| try { | ||
| zms.dbService.executeDeleteRoleWithAssumeRoleAssertions( | ||
| mockDomRsrcCtx, | ||
| domainName, | ||
| roleName, | ||
| providerDomain, | ||
| providerRoleName, | ||
| auditRef, | ||
| "deleteRoleWithAssumeRoleAssertions"); | ||
| fail(); | ||
| } catch (ResourceException ex) { | ||
| assertEquals(ResourceException.NOT_FOUND, ex.getCode()); | ||
| assertTrue(ex.getMessage().contains( | ||
| "unable to delete assume role assertions")); | ||
| } finally { | ||
| zms.dbService.defaultRetryCount = saveRetryCount; | ||
| zms.dbService.store = saveStore; | ||
| } | ||
|
|
||
| // Case 2: DAO returns an empty list – expected behavior is identical | ||
| Mockito.when(mockJdbcConn.deleteAssumeRoleAssertions( | ||
| eq(domainName), eq(providerDomain), eq(providerRoleName))) | ||
| .thenReturn(Collections.emptyList()); | ||
|
|
||
| saveStore = zms.dbService.store; | ||
| zms.dbService.store = mockObjStore; | ||
|
|
||
| saveRetryCount = zms.dbService.defaultRetryCount; | ||
| zms.dbService.defaultRetryCount = 1; | ||
|
|
||
| try { | ||
| zms.dbService.executeDeleteRoleWithAssumeRoleAssertions( | ||
| mockDomRsrcCtx, | ||
| domainName, | ||
| roleName, | ||
| providerDomain, | ||
| providerRoleName, | ||
| auditRef, | ||
| "deleteRoleWithAssumeRoleAssertions"); | ||
| fail(); | ||
| } catch (ResourceException ex) { | ||
| assertEquals(ResourceException.NOT_FOUND, ex.getCode()); | ||
| assertTrue(ex.getMessage().contains( | ||
| "unable to delete assume role assertions")); | ||
| } finally { | ||
| zms.dbService.defaultRetryCount = saveRetryCount; | ||
| zms.dbService.store = saveStore; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case appears to be broken. It's attempting to call a method executeDeleteRoleWithAssumeRoleAssertions on zms.dbService, but this method does not exist in DBService.
Furthermore, the test expects a ResourceException.NOT_FOUND when no assertions are found. However, the new executeDeleteAssumeRoleAssertions method is designed to simply return without throwing an exception in this scenario, allowing the provider role deletion to proceed.
This test should either be removed or corrected to test the actual behavior of executeDeleteAssumeRoleAssertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I have deleted the test.
…ions method Signed-off-by: Go Yakami <[email protected]>
…Assertions test Signed-off-by: Go Yakami <[email protected]>
…rocessing Signed-off-by: Go Yakami <[email protected]>
|
@havetisyan |
|
@havetisyan GitHub Actions |
…RoleAssertions (AthenZ/athenz#3027) Signed-off-by: Chandu Raman <[email protected]>
* Add self-serve group members resource Implements athenz_self_serve_group_members resource to manage specific group members without affecting externally managed ones. Follows the same pattern as self-serve role members but adapted for groups. Signed-off-by: Chandu Raman <[email protected]> * update system test result to include new field autoDeleteTenantAssumeRoleAssertions (AthenZ/athenz#3027) Signed-off-by: Chandu Raman <[email protected]> --------- Signed-off-by: Chandu Raman <[email protected]> Co-authored-by: Chandu Raman <[email protected]>
Description
Overview
This PR adds a feature to automatically delete related
assume_roleassertions in tenant domains when a provider role is deleted.#2910
Changes
Domain Model Extension
autoDeleteTenantAssumeRoleAssertionsflag to all domain types (Domain, DomainData, DomainMeta, SubDomain, TopLevelDomain, UserDomain).Database Schema Update
auto_delete_tenant_assume_role_assertionscolumn (TINYINT(1)) to thedomaintable.update-20250711.sql.JDBCConnection Class Extension
deleteAssumeRoleAssertionsmethod to find and deleteassume_roleassertions in a tenant domain that reference a specific provider role.DBService Class Extension
executeDeleteAssumeRoleAssertionsmethods.ZMSImpl Class Extension
autoDeleteTenantAssumeRoleAssertionsflag (loaded from the configuration file).deleteRolemethod to also delete tenantassume_roleassertions when the conditions are met.Configuration File Update
athenz.zms.auto_delete_tenant_assume_role_assertionssetting tozms.properties(default is false).Behavior Details
autoDeleteTenantAssumeRoleAssertionsflag is set totruein the server configuration.autoDeleteTenantAssumeRoleAssertionsflag is set totruefor the tenant domain.assume_roleassertions within the tenant domain that reference the provider role being deleted.Tests
Impact
This change is opt-in and does not affect existing behavior. To use this new feature, both the server and the domain settings must be enabled.
Contribution Checklist:
Attach Screenshots (Optional)