-
Notifications
You must be signed in to change notification settings - Fork 197
Add base representation for storage files #653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 18 out of 33 changed files in this pull request and generated no comments.
Files not reviewed (15)
- xtable-api/src/test/java/org/apache/xtable/model/storage/TestDataFilesDiff.java: Evaluated as low risk
- xtable-api/src/main/java/org/apache/xtable/model/stat/Range.java: Evaluated as low risk
- xtable-core/src/main/java/org/apache/xtable/delta/DeltaPartitionExtractor.java: Evaluated as low risk
- xtable-core/src/main/java/org/apache/xtable/hudi/HudiDataFileExtractor.java: Evaluated as low risk
- xtable-api/src/main/java/org/apache/xtable/model/storage/FilesDiff.java: Evaluated as low risk
- xtable-core/src/main/java/org/apache/xtable/delta/DeltaDataFileUpdatesExtractor.java: Evaluated as low risk
- xtable-api/src/main/java/org/apache/xtable/spi/sync/ConversionTarget.java: Evaluated as low risk
- xtable-api/src/main/java/org/apache/xtable/model/storage/PartitionFileGroup.java: Evaluated as low risk
- xtable-api/src/main/java/org/apache/xtable/model/TableChange.java: Evaluated as low risk
- xtable-api/src/test/java/org/apache/xtable/spi/extractor/TestExtractFromSource.java: Evaluated as low risk
- xtable-core/src/main/java/org/apache/xtable/hudi/HudiConversionTarget.java: Evaluated as low risk
- xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionTarget.java: Evaluated as low risk
- xtable-core/src/main/java/org/apache/xtable/hudi/HudiFileStatsExtractor.java: Evaluated as low risk
- xtable-api/src/test/java/org/apache/xtable/spi/sync/TestTableFormatSync.java: Evaluated as low risk
- xtable-core/src/main/java/org/apache/xtable/hudi/BaseFileUpdatesExtractor.java: Evaluated as low risk
xtable-api/src/main/java/org/apache/xtable/model/storage/FileType.java
Outdated
Show resolved
Hide resolved
xtable-api/src/main/java/org/apache/xtable/model/storage/InternalBaseFile.java
Outdated
Show resolved
Hide resolved
| * @return the set of files that are added | ||
| */ | ||
| public static <P> FilesDiff<InternalDataFile, P> findNewAndRemovedFiles( | ||
| public static <P> FilesDiff<? extends InternalBaseFile, P> findNewAndRemovedFiles( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be less verbose to define some interface for InternalFile that we can use instead of this wildcard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verbosity of generics doesn't seem like a problem to me; verbosity is something unavoidable in Java. Also, I don't think that would be the good use of an interface and it wouldn't be consistent. That said, there were places where the base class was sufficient, so I used it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't it be a good use of an interface? Currently we are always casting to a specific instance type instead of leveraging the shared implementation of the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't it be a good use of an interface?
InternalFile is a data class. If the data class is simply a container for data and doesn't require any specific behavior or implementation, then using an interface is uncommon
Currently we are always casting to a specific instance
I wouldn't use call it always, but anyway.
Well currently none of the sources and target are handling any other kind of file type. There are pending PRs for that and new ones will be needed in the future.
instead of leveraging the shared implementation of the base class.
That will not happen even with interface. Each target and source will handle each type of file differently.
For now, I’m going to abandon this PR as I don’t see a clear path to resolving this issue. If anyone has suggestions or ideas on how to tackle this, I’m all ears!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is not a reason to close the PR. I was just trying to understand where you were coming from with your approach so I could try to see where things were going in general with how we can handle the different types of files in the targets.
xtable-api/src/main/java/org/apache/xtable/model/storage/InternalDataFile.java
Outdated
Show resolved
Hide resolved
xtable-api/src/main/java/org/apache/xtable/model/storage/InternalDataFile.java
Outdated
Show resolved
Hide resolved
bd3f38f to
6a21d14
Compare
|
@the-other-tim-brown what do you think of name |
6a21d14 to
66a2b86
Compare
the-other-tim-brown
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor nitpick but otherwise LGTM!
xtable-api/src/main/java/org/apache/xtable/model/stat/Range.java
Outdated
Show resolved
Hide resolved
66a2b86 to
866ba84
Compare
Fixes #652
The main goal of this pull request is to establish a common structure for storage files and introduces an abstraction for all types of storage files associated to a table. This should facilitate future extensions and ensure stability of APIs. The abstraction will serve as the base for various file types such as data files, delete files, etc.
Different table formats support various storage files, and as these evolve, more file types are expected to be added. The abstraction provided by the base class is helpful for maintaining the stability of core XTable APIs while supporting extensibility. For instance if a new storage file for statistics is introduced, XTable code for detecting file changes between commits need not change.
While the APIs need not change, handling of new types will be needed in the source and target implementations. For e.g., a target may not support deletion type files. By default the target format should throw an error if a deletion file is detected and not handled. This may not be true for in all cases.
Additionally, minor refactoring has been performed to rename certain elements for better clarity and consistency.
This PR does not add any new functionality of change in behavior. Tests have been updated.