Skip to content

Conversation

@JohnFitzpatrick44
Copy link
Member

Description

Fixes #27732 . Corrects an underlying bug that only affected predicates using the $partition column.

I updated an existing test to verify this behavior, as the test previously obscured this bug by using EXECUTE REMOVE_ORPHAN_FILES, which removes the dangling deletes that should've been removed by OPTIMIZE.

Additional context and related issues

I believe the logic used to determine whether a position delete is fully applied is faulty, but I could use a careful review here to verify my understanding.

From the comment on line 348: OPTIMIZE supports only enforced predicates which select whole partitions, so if there is no path or fileModifiedTime predicate, then we can clean up position deletes

This seems to contradict the logic on line 351: case POSITION_DELETES -> partitionDomain.isAll() && pathDomain.isAll() && fileModifiedTimeDomain.isAll();

As I understand this, partitionDomain.isAll() just checks whether there is any filter on the $partition column. As position delete files are only granular to the partition level, just checking the path and fileModifiedTime domains should be sufficient, as that would mean we're fully covering the partition.

For predicates on the filter column (i.e. not using $partition), partitionDomain.isAll() is true, because we are not explicitly filtering on the $partition column. This is why the bug doesn't surface for these predicates.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Section
* OPTIMIZE now properly applies to predicates using the `$predicate` column. ( #27732 )

@cla-bot cla-bot bot added the cla-signed label Dec 24, 2025
@github-actions github-actions bot added the iceberg Iceberg connector label Dec 24, 2025
@guyco33
Copy link
Member

guyco33 commented Dec 29, 2025

Thanks @JohnFitzpatrick44 for this quick fix!
I can approve that it fixes the issue after doing some tests on some real datasets as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

POSITION_DELETES files in Iceberg are not optimized when using $partition filter

2 participants