-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Not able to delete existing table & schema from Pinot UI #17294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
25a10ab to
07b4430
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17294 +/- ##
============================================
+ Coverage 63.19% 63.27% +0.07%
- Complexity 1432 1474 +42
============================================
Files 3131 3135 +4
Lines 185838 186479 +641
Branches 28397 28496 +99
============================================
+ Hits 117443 117995 +552
- Misses 59333 59367 +34
- Partials 9062 9117 +55
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances batch deletion capabilities in Pinot's filesystem implementations by addressing two key issues from PR #15473: adding optimized batch deletion support for Hadoop filesystem and implementing safety checks for non-existent files during deletion operations.
- Adds an optimized
deleteBatch()method toHadoopPinotFSthat handles directories and files efficiently - Implements existence checks before deletion in both
BasePinotFSandHadoopPinotFSto prevent errors when files don't exist - Fixes a typo in error message ("direactory" → "directory")
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pinot-spi/src/main/java/org/apache/pinot/spi/filesystem/BasePinotFS.java | Adds existence check before deletion to prevent FileNotFoundException |
| pinot-plugins/pinot-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java | Implements optimized batch deletion for Hadoop filesystem with recursive directory handling and corrects spelling error |
| for (Path path : pathsToDelete) { | ||
| try { | ||
| if (!_hadoopFS.delete(path, true)) { | ||
| LOGGER.warn("Failed to delete path: {}", path); | ||
| result = false; | ||
| } | ||
| } catch (IOException e) { | ||
| LOGGER.warn("Error deleting path: {}", path, e); | ||
| result = false; | ||
| } | ||
| } |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batch deletion implementation still deletes files one by one in a loop, negating the performance benefit of batch operations. Consider using Hadoop's bulk delete APIs if available, or at least parallelize the deletion operations using ExecutorService to improve performance for large batches.
| LOGGER.warn("Directory {} is not empty and forceDelete is false, skipping", segmentUri); | ||
| result = false; | ||
| continue; | ||
| } |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When forceDelete is false and the directory is empty, the code skips collectFilesRecursively() but should still add the empty directory to pathsToDelete for deletion. Currently, empty directories are not deleted when forceDelete is false.
| } | |
| } | |
| // Directory is empty, add it for deletion | |
| pathsToDelete.add(path); | |
| continue; |
|
#17292 issues being solv ed |
...t-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java
Show resolved
Hide resolved
|
Sure
…On Thu, 4 Dec 2025 at 2:41 PM, Abhishek Bafna ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
pinot-plugins/pinot-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java
<#17294 (comment)>:
> @@ -88,6 +88,83 @@ public boolean delete(URI segmentUri, boolean forceDelete)
return _hadoopFS.delete(new Path(segmentUri), true);
}
+ @OverRide
Please add test cases for all the public methods.
—
Reply to this email directly, view it on GitHub
<#17294 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVL2AZQBR74LQDTVUDGCBSL3773F5AVCNFSM6AAAAACNVIM3JCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKMZYHA3DGNRWGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
07b4430 to
4e5004e
Compare
|
@abhishekbafna @Jackie-Jiang please review. Thanks!!! |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. We need to define the contract for delete() and deleteBatch() and make all PinotFS implementation follow the same contract
| boolean result = true; | ||
| for (URI segmentUri : segmentUris) { | ||
| // Check if file exists before attempting deletion to avoid FileNotFoundException | ||
| if (!exists(segmentUri)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the contract for deleteBatch() and delete() different.
We need to decide the behavior for the following scenarios:
- For
delete()- File exists
- File is dir
- Deleted
- Not deleted
- File is ordinary file
- Deleted
- Not deleted
- File is dir
- File doesn't exist
- File exists
- For
deleteBatch()- All files deleted
- Some files deleted
- None file deleted
@swaminathanmanish @KKcorps @abhishekbafna Can you please also comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going by the idempotent property, successful deletion and missing file, both should return true, otherwise false.
For the batch, if all the files are deleted or removed, return true otherwise false.
|
@Jackie-Jiang did we conclude on it? |
|
@abhishekbafna @Jackie-Jiang @xiangfu0 please review and any comments |
7cff2a4
Issue in PR #15473
What the PR Did (The Good Part)
The PR added a new
deleteBatch()method to delete multiple files at once instead of one by one. This is much faster, especially for cloud storage like S3.Example: Instead of deleting 100 files one at a time (100 API calls), you can delete them all together (1 API call).
The Problems We Found (What Was Missing)
Problem 1: Hadoop Filesystem Was Left Out
deleteBatch()for S3 (Amazon's storage)Analogy: It's like upgrading all cars to electric except the delivery trucks - they still run on old technology.
Problem 2: No Safety Check for Missing Files
deleteBatch()inBasePinotFStries to delete files without checking if they exist first