Skip to content

Disjunctive Predicate Stitching for Materialized Views and Partition Stitching for Iceberg #26756

@tdcmeehan

Description

@tdcmeehan

Expected Behavior or Use Case

A disjunctive predicate is a predicate formed by OR'ing multiple conditions together. When a materialized view is partially stale (some base table partitions changed since last refresh), the system should combine fresh data from storage with recomputed stale data via UNION, rather than forcing a choice between fully stale data or full recomputation. The connector can return which predicate disjuncts are stale and these predicates can be combined with the fresh data table is the predicates are over columns which have a mapping between the base table to the view.

Presto Component, Service, or Connector

  • presto-main-base: MaterializedViewRewrite optimizer rule
  • presto-iceberg: Staleness predicate detection via partition tracking

Possible Implementation

Add USE_STITCHING mode controlled by materialized_view_data_consistency session property. When enabled:

  1. Query storage for fresh rows: WHERE NOT stale_predicate
  2. Recompute stale rows from base tables: WHERE stale_predicate
  3. Combine via UNION ALL

Column Equivalences: Map stale predicates through JOIN equalities (e.g., orders.order_date = customers.reg_date) so staleness in one table propagates to equivalent columns.

Duplicate Prevention: When multiple tables are stale, each recompute branch excludes previously processed tables:

  • Branch A: WHERE stale_A
  • Branch B: WHERE stale_B AND NOT stale_A

Falls back to full recompute for outer joins, non-deterministic functions, or unmappable predicates.

Operator Behavior
INNER JOIN Predicates propagate through column equivalences to enable partition pruning on joined tables
OUTER JOIN Stitching disabled entirely—falls back to full recompute (null-padding semantics break predicate mapping)
UNION Fresh tables optimized away to prevent duplicates
EXCEPT / INTERSECT Predicate propagation disabled inside these operators—they require complete data from non-stale tables to compute correct results
Aggregation Supported—aggregation operates correctly on filtered stale partitions
Non-deterministic functions Stitching disabled—RANDOM(), NOW(), UUID() etc. would produce inconsistent results between storage and recompute branches

Initial support for Iceberg identity partitions will be added. Future work can enable predicate stitching for non-identify partitions.

Example Screenshots (if appropriate):

Context

Large materialized views are expensive to fully recompute. When only a few partitions change, stitching avoids reprocessing terabytes of unchanged data while still returning correct results.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

🆕 Unprioritized

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions