-
Notifications
You must be signed in to change notification settings - Fork 2k
[Spark] Hybrid Delta connector combining V1(default) and V2 connectors #5726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| case Some(catalogTable) => | ||
| DeltaSourceUtils.isDeltaDataSourceName(s.sourceName) || | ||
| catalogTable.provider.exists(DeltaSourceUtils.isDeltaDataSourceName) || | ||
| CatalogTableUtils.isUnityCatalogManagedTable(catalogTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we skip checking CatalogTableUtils.isUnityCatalogManagedTable(catalogTable) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the scope of the change is to handle only uc owned tables?
| // scalastyle:off caselocale | ||
| mode.toUpperCase match { | ||
| // scalastyle:on caselocale | ||
| case "STRICT" => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just return V2 table in DeltaCatalog in STRICT mode. The new analyzer rule only happens on AUTO mode
| case s: StreamingRelation if shouldApplyV2Streaming(s) => | ||
| // catalogTable is guaranteed to be defined because shouldApplyV2Streaming checks it | ||
| // via isDeltaStreamingRelation, but we use pattern matching for safety | ||
| s.dataSource.catalogTable match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if shouldApplyV2Streaming returns yes, s.dataSource.catalogTable should be defined
| ScalaUtils.toJavaMap(catalogTable.properties)) | ||
|
|
||
| // Add a marker to indicate this schema comes from V2 streaming (SparkTable) | ||
| // This allows DeltaDataSource.sourceSchema to distinguish between: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeltaDataSource.sourceSchema is called before this rule
| * UC-managed markers through Spark's normal catalog resolution paths, without manually patching | ||
| * CatalogTable objects in tests. | ||
| */ | ||
| class UCTableInjectingSessionCatalogInner extends DelegatingCatalogExtension { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why extending DelegatingCatalogExtension? customizing the loadTable method of UCTableInjectingSessionCatalog seems enough
Which Delta project/connector is this regarding?
Description
This PR improves the Delta V2 streaming POC by:
AUTOas a validspark.databricks.delta.v2.enableModevalue and keeping production default asNONEwhile enablingAUTOby default in SQL tests.ApplyV2Streaming(extracted into its own rule) to rewrite V1StreamingRelationto V2StreamingRelationV2backed by KernelSparkTablewhen enabled.DeltaDataSource.sourceSchemaavoid redundant schema loading when the schema originated from the V2 streaming path, using a dedicated marker option.How was this patch tested?
./build/sbt "spark/testOnly org.apache.spark.sql.delta.V2StreamingConversionSuite"initialOffset with initial snapshot is not supported yet(TODO left in test to flip to a passing test once supported).Does this PR introduce any user-facing changes?
No.