Not able to delete existing table & schema from Pinot UI #17294

Akanksha-kedia · 2025-12-01T11:23:53Z

7cff2a4

Issue in PR #15473

What the PR Did (The Good Part)

The PR added a new deleteBatch() method to delete multiple files at once instead of one by one. This is much faster, especially for cloud storage like S3.

Example: Instead of deleting 100 files one at a time (100 API calls), you can delete them all together (1 API call).

The Problems We Found (What Was Missing)

Problem 1: Hadoop Filesystem Was Left Out

The PR added optimized deleteBatch() for S3 (Amazon's storage)
But Hadoop filesystem (HDFS) was forgotten - it still deletes files one by one
This means Hadoop users don't get the performance improvement

Analogy: It's like upgrading all cars to electric except the delivery trucks - they still run on old technology.

Problem 2: No Safety Check for Missing Files

The default deleteBatch() in BasePinotFS tries to delete files without checking if they exist first
If a file is already deleted or doesn't exist, it throws an error and stops
This can break the deletion process

codecov-commenter · 2025-12-01T12:51:27Z

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.27%. Comparing base (48fcf00) to head (4e5004e).
⚠️ Report is 33 commits behind head on master.

Files with missing lines	Patch %	Lines
...a/org/apache/pinot/spi/filesystem/BasePinotFS.java	0.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #17294      +/-   ##
============================================
+ Coverage     63.19%   63.27%   +0.07%     
- Complexity     1432     1474      +42     
============================================
  Files          3131     3135       +4     
  Lines        185838   186479     +641     
  Branches      28397    28496      +99     
============================================
+ Hits         117443   117995     +552     
- Misses        59333    59367      +34     
- Partials       9062     9117      +55

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-11	`63.22% <0.00%> (+7.67%)`	⬆️
java-21	`63.25% <0.00%> (+0.08%)`	⬆️
temurin	`63.27% <0.00%> (+0.07%)`	⬆️
unittests	`63.27% <0.00%> (+0.07%)`	⬆️
unittests1	`55.68% <0.00%> (+0.07%)`	⬆️
unittests2	`33.91% <0.00%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR enhances batch deletion capabilities in Pinot's filesystem implementations by addressing two key issues from PR #15473: adding optimized batch deletion support for Hadoop filesystem and implementing safety checks for non-existent files during deletion operations.

Adds an optimized deleteBatch() method to HadoopPinotFS that handles directories and files efficiently
Implements existence checks before deletion in both BasePinotFS and HadoopPinotFS to prevent errors when files don't exist
Fixes a typo in error message ("direactory" → "directory")

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
pinot-spi/src/main/java/org/apache/pinot/spi/filesystem/BasePinotFS.java	Adds existence check before deletion to prevent FileNotFoundException
pinot-plugins/pinot-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java	Implements optimized batch deletion for Hadoop filesystem with recursive directory handling and corrects spelling error

Copilot · 2025-12-01T20:42:41Z

...t-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java

+      for (Path path : pathsToDelete) {
+        try {
+          if (!_hadoopFS.delete(path, true)) {
+            LOGGER.warn("Failed to delete path: {}", path);
+            result = false;
+          }
+        } catch (IOException e) {
+          LOGGER.warn("Error deleting path: {}", path, e);
+          result = false;
+        }
+      }


The batch deletion implementation still deletes files one by one in a loop, negating the performance benefit of batch operations. Consider using Hadoop's bulk delete APIs if available, or at least parallelize the deletion operations using ExecutorService to improve performance for large batches.

Copilot · 2025-12-01T20:42:42Z

...t-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java

+              LOGGER.warn("Directory {} is not empty and forceDelete is false, skipping", segmentUri);
+              result = false;
+              continue;
+            }


When forceDelete is false and the directory is empty, the code skips collectFilesRecursively() but should still add the empty directory to pathsToDelete for deletion. Currently, empty directories are not deleted when forceDelete is false.

Suggested change

}

}

// Directory is empty, add it for deletion

pathsToDelete.add(path);

continue;

Akanksha-kedia · 2025-12-02T03:54:41Z

#17292 issues being solv ed

...t-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java

Akanksha-kedia · 2025-12-04T11:43:37Z

Sure

…

On Thu, 4 Dec 2025 at 2:41 PM, Abhishek Bafna ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pinot-plugins/pinot-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java <#17294 (comment)>: > @@ -88,6 +88,83 @@ public boolean delete(URI segmentUri, boolean forceDelete) return _hadoopFS.delete(new Path(segmentUri), true); } + @OverRide Please add test cases for all the public methods. — Reply to this email directly, view it on GitHub <#17294 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVL2AZQBR74LQDTVUDGCBSL3773F5AVCNFSM6AAAAACNVIM3JCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKMZYHA3DGNRWGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Akanksha-kedia · 2025-12-08T06:51:19Z

@abhishekbafna @Jackie-Jiang please review. Thanks!!!

Jackie-Jiang

Thanks for the fix. We need to define the contract for delete() and deleteBatch() and make all PinotFS implementation follow the same contract

Jackie-Jiang · 2025-12-09T00:07:45Z

pinot-spi/src/main/java/org/apache/pinot/spi/filesystem/BasePinotFS.java

    boolean result = true;
    for (URI segmentUri : segmentUris) {
+      // Check if file exists before attempting deletion to avoid FileNotFoundException
+      if (!exists(segmentUri)) {


This makes the contract for deleteBatch() and delete() different.

We need to decide the behavior for the following scenarios:

For delete()

File exists

File is dir

Deleted

Not deleted

File is ordinary file

Deleted

Not deleted

File doesn't exist

For deleteBatch()

All files deleted

Some files deleted

None file deleted

@swaminathanmanish @KKcorps @abhishekbafna Can you please also comment here?

Going by the idempotent property, successful deletion and missing file, both should return true, otherwise false.

For the batch, if all the files are deleted or removed, return true otherwise false.

Akanksha-kedia · 2025-12-12T16:02:16Z

@Jackie-Jiang did we conclude on it?

Akanksha-kedia · 2025-12-18T07:54:46Z

@abhishekbafna @Jackie-Jiang @xiangfu0 please review and any comments

Akanksha-kedia force-pushed the feature/fix_hdfs1 branch from 25a10ab to 07b4430 Compare December 1, 2025 12:01

Jackie-Jiang added enhancement hadoop ingestion labels Dec 1, 2025

Jackie-Jiang requested review from abhishekbafna and Copilot and removed request for abhishekbafna December 1, 2025 20:42

Copilot AI reviewed Dec 1, 2025

View reviewed changes

abhishekbafna reviewed Dec 4, 2025

View reviewed changes

...t-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java Show resolved Hide resolved

Not able to delete existing table & schema from Pinot UI

4e5004e

Akanksha-kedia force-pushed the feature/fix_hdfs1 branch from 07b4430 to 4e5004e Compare December 8, 2025 06:50

Jackie-Jiang reviewed Dec 9, 2025

View reviewed changes

-            }
+            }
+            // Directory is empty, add it for deletion
+            pathsToDelete.add(path);
+            continue;

Not able to delete existing table & schema from Pinot UI #17294

Are you sure you want to change the base?

Not able to delete existing table & schema from Pinot UI #17294

Conversation

Akanksha-kedia commented Dec 1, 2025

Issue in PR #15473

What the PR Did (The Good Part)

The Problems We Found (What Was Missing)

Problem 1: Hadoop Filesystem Was Left Out

Problem 2: No Safety Check for Missing Files

Uh oh!

codecov-commenter commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Akanksha-kedia commented Dec 2, 2025

Uh oh!

Uh oh!

Akanksha-kedia commented Dec 4, 2025 via email

Uh oh!

Akanksha-kedia commented Dec 8, 2025

Uh oh!

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

abhishekbafna Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Akanksha-kedia commented Dec 12, 2025

Uh oh!

Akanksha-kedia commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Dec 1, 2025 •

edited

Loading