-
Notifications
You must be signed in to change notification settings - Fork 2k
[Kernel-Spark][DSv2] Add CCv2 routing to SparkTable and SnapshotManagerFactory #5678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Kernel-Spark][DSv2] Add CCv2 routing to SparkTable and SnapshotManagerFactory #5678
Conversation
998c638 to
c0df33e
Compare
d59108e to
6538c8b
Compare
kernel-spark/src/main/java/io/delta/kernel/spark/snapshot/SnapshotManagerFactory.java
Outdated
Show resolved
Hide resolved
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/5655/files/07f3b5b1a0e9f9eb1794ac4b7b7ff151cc9a3b10..0b5886852bbe9871ab2c6ee5893da940ef69edcc) to review incremental changes. - [catalogtableutils-ccv2](#5477) [[Files changed](https://github.com/delta-io/delta/pull/5477/files)] - [stack/ccv2-uc-utils](#5605) [[Files changed](https://github.com/delta-io/delta/pull/5605/files/9ca17935c6cb19b8c94168853e18d9965130f9f1..07f3b5b1a0e9f9eb1794ac4b7b7ff151cc9a3b10)] - [**stack/snapshotmanager-wireframe**](#5655) [[Files changed](https://github.com/delta-io/delta/pull/5655/files/07f3b5b1a0e9f9eb1794ac4b7b7ff151cc9a3b10..0b5886852bbe9871ab2c6ee5893da940ef69edcc)] - [stack/snapshotmanager-implementation](#5677) [[Files changed](https://github.com/delta-io/delta/pull/5677/files/0b5886852bbe9871ab2c6ee5893da940ef69edcc..a7b46bbc8d16040a10cc61fa65876af5b9e6ed0b)] - [stack/snapshotmanager-factory-wireup](#5678) [[Files changed](https://github.com/delta-io/delta/pull/5678/files/a7b46bbc8d16040a10cc61fa65876af5b9e6ed0b..6538c8b460050666e918f8853f67ab70ccea2d92)] --------- <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> This PR: - Adds `kernelUnityCatalog` as a dependency for `sparkV2` to wire in UC-managed functionality. - Introduces `UCManagedSnapshotManager` skeleton for UC-managed tables that will delegate snapshot/ commit operations to UCCatalogManagedClient (methods currently TODO). (Orange is the previous PR, green is what we've introduced in this PR. Routing is deferred) ```mermaid flowchart TB A["Spark table CatalogTable"] -- decide snapshot manager --> B{"Select snapshot manager"} B -- path based --> P["PathBasedSnapshotManager loadLatestSnapshot loadSnapshotAt"] C["UCCatalogManagedClient"] -- build UC REST client --> D["UC REST client"] D -- GET ratified commits and table metadata --> U["Unity Catalog server"] X["UCUtils extractTableInfo<br>(tableId, path, uri, token)"] -- needs UC table details --> M["UCManagedSnapshotManager<br>(UCCatalogManagedClient(UCClient))"] B -- UC managed --> X M --> C C:::added X:::extraction M:::added classDef extraction fill:#ffb347,stroke:#d97706,color:#000 classDef added fill:#b7e1a1,stroke:#2d6a4f,color:#000 ``` ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Locally and CI. ## Does this PR introduce _any_ user-facing changes? No. <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> --------- Signed-off-by: Timothy Wang <[email protected]> Signed-off-by: TimothyW553 <[email protected]>
6538c8b to
8993eb0
Compare
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package io.delta.kernel.spark.snapshot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW are we sure this should be under io.delta.kernel? Not under io.delta.spark?
cc @raveeram-db and @tdas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this making no sense. cc @huan233usc we MUST change this.
in the same vein, even the name kernel-spark does not make sense at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, let's file an issue to update this in separate PR Xin? Thanks for pointing this out @scottsand-db !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracking here: #5728 and will be addressed in a separate PR
kernel-spark/src/main/java/io/delta/kernel/spark/snapshot/SnapshotManagerFactory.java
Show resolved
Hide resolved
fb59e86 to
f1ad709
Compare
| * </ul> | ||
| */ | ||
| @Experimental | ||
| public final class SnapshotManagerFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it produces DeltaSnapshotManager, then it should be a DeltaSnapshotManagedFactory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: everything here is Delta -- probably don't need to prefix things with it?
Signed-off-by: Timothy Wang <[email protected]>
Signed-off-by: Timothy Wang <[email protected]>
Signed-off-by: Timothy Wang <[email protected]>
…s, and make names more descriptive
f1ad709 to
7fd42af
Compare
7fd42af to
a986581
Compare
a986581 to
b3bcb24
Compare
b3bcb24 to
2410591
Compare
…ing UCCatalogManagedClient (#5677) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/5677/files/b2ba4e4cb4d7c0dfaa4f0542e12ef27798c2f43c..29b67038e4ef85fd7dfde7bd2f493b1420976ebe) to review incremental changes. - [catalogtableutils-ccv2](#5477) [[Files changed](https://github.com/delta-io/delta/pull/5477/files)] - [**stack/snapshotmanager-implementation**](#5677) [[Files changed](https://github.com/delta-io/delta/pull/5677/files/b2ba4e4cb4d7c0dfaa4f0542e12ef27798c2f43c..29b67038e4ef85fd7dfde7bd2f493b1420976ebe)] - [stack/snapshotmanager-factory-wireup](#5678) [[Files changed](https://github.com/delta-io/delta/pull/5678/files/da30e83d7911bd4a97cf73cc2c9c0fdd38dc8a39..cbb134e12af1b3d8b02d2e8a5f6632327f7f8f51)] - [stack/e2e-ccv2-tests](#5734) [[Files changed](https://github.com/delta-io/delta/pull/5734/files/cbb134e12af1b3d8b02d2e8a5f6632327f7f8f51..19b8915e6f051ba875932f89494f2e27954cdcb9)] --------- <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This PR implements the wireframe introduced in the previous PR of the stack. For this specific `UCManagedTableSnapshotManager`, it delegates snapshot loading operations to the `UCCatalogCommitClient`. `getActiveCommitAtTime(t)` loads the latest snapshot, gets its commits, and uses those along with `DeltaHistoryManager.getActiveCommitAtTimestamp(t, catalogCommits, ...)` to get the appropriate commit at time `t`. <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> ## How was this patch tested? Locally and CI. <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> ## Does this PR introduce _any_ user-facing changes? No. <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> --------- Signed-off-by: Timothy Wang <[email protected]>
🥞 Stacked PR
Use this link to review incremental changes.
Which Delta project/connector is this regarding?
Description
This PR introduces a factory that returns the appropriate snapshot manager for a
SparkTable. This factory checks theCatalogTableto determine if the snapshot manager should be a PathBasedSnapshotManager, UCManagedTableSnapshotManager, or another CatalogedManagedSnapshotManager in the future.How was this patch tested?
Locally and CI.
Does this PR introduce any user-facing changes?
No.