-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
Expected Behavior or Use Case
A disjunctive predicate is a predicate formed by OR'ing multiple conditions together. When a materialized view is partially stale (some base table partitions changed since last refresh), the system should combine fresh data from storage with recomputed stale data via UNION, rather than forcing a choice between fully stale data or full recomputation. The connector can return which predicate disjuncts are stale and these predicates can be combined with the fresh data table is the predicates are over columns which have a mapping between the base table to the view.
Presto Component, Service, or Connector
presto-main-base: MaterializedViewRewrite optimizer rulepresto-iceberg: Staleness predicate detection via partition tracking
Possible Implementation
Add USE_STITCHING mode controlled by materialized_view_data_consistency session property. When enabled:
- Query storage for fresh rows: WHERE NOT stale_predicate
- Recompute stale rows from base tables: WHERE stale_predicate
- Combine via UNION ALL
Column Equivalences: Map stale predicates through JOIN equalities (e.g., orders.order_date = customers.reg_date) so staleness in one table propagates to equivalent columns.
Duplicate Prevention: When multiple tables are stale, each recompute branch excludes previously processed tables:
- Branch A: WHERE stale_A
- Branch B: WHERE stale_B AND NOT stale_A
Falls back to full recompute for outer joins, non-deterministic functions, or unmappable predicates.
| Operator | Behavior |
|---|---|
| INNER JOIN | Predicates propagate through column equivalences to enable partition pruning on joined tables |
| OUTER JOIN | Stitching disabled entirely—falls back to full recompute (null-padding semantics break predicate mapping) |
| UNION | Fresh tables optimized away to prevent duplicates |
| EXCEPT / INTERSECT | Predicate propagation disabled inside these operators—they require complete data from non-stale tables to compute correct results |
| Aggregation | Supported—aggregation operates correctly on filtered stale partitions |
| Non-deterministic functions | Stitching disabled—RANDOM(), NOW(), UUID() etc. would produce inconsistent results between storage and recompute branches |
Initial support for Iceberg identity partitions will be added. Future work can enable predicate stitching for non-identify partitions.
Example Screenshots (if appropriate):
Context
Large materialized views are expensive to fully recompute. When only a few partitions change, stitching avoids reprocessing terabytes of unchanged data while still returning correct results.
Metadata
Metadata
Assignees
Type
Projects
Status