Add punch hole GC #326

v01dstar · 2024-08-10T09:48:47Z

This PR together with #323 implements Titan's new GC solution: Punch hole.

Titan has implemented 2 types of GC methods:

GC based on stats (when discardable ratio of blob files reach certain threshold)
GC during RocksDB's compactions (a.k.a level merge)

The first approach introduces write amplification, since it needs to update references inside RocksDB to reflect blobs' new location, and due to this reason, the threshold has to be relatively high to minimize this impact.
The second approach has no extra write amplification, however, it may result in space waste. Imagine large portions of some blob files have been moved, but not yet meet the threshold to be entirely replaced.

Punch hole is a file system API, accepts a file descriptor, a start position and a length, removes the data from the file at the desired location. To application, this part of the file is set to 0. To file system, if some the underlying storage blocks are fully covered by the removed part, file system can reclaim those space.

This project try to utilize the punch hole API, in-place delete blobs during GC, to avoid updating references inside RocksDB

src/blob_format.h

include/titan/options.h

Connor1996 · 2024-08-19T08:24:28Z

src/blob_format.h

+  // The effective size of current file. This is different from `file_size_`, as
+  // `file_size_` is the original size of the file, and does not consider space
+  // reclaimed by punch hole GC.
+  // We can't use file system's `st_blocks` to get the logical size, because


I think it's okay to get the size as effective_file_size after restart. The size doesn't have to be so precise. Indeed, it may have false positive for triggering punch hole GC, but it would be updated to the accurate number after the gc scan.
Then, we can get rid of updating manifest for effective_file_size

src/blob_file_builder.cc

src/blob_file_size_collector.cc

src/punch_hole_gc_job.cc

src/blob_gc_picker.cc

src/blob_storage.cc

hbisheng

It would be great to have a PR description to help future readers. The description could provide some background and highlight key aspects of implementation such as the introduction of the PunchHoleGCJob class and how the job is scheduled (when its snapshot becomes the oldest).

src/db_impl.cc

hbisheng · 2024-08-29T08:19:37Z

src/blob_file_iterator.cc

+    // record by adjusting iterate_offset_, otherwise (not a hole-punch record),
+    // we should break the loop and return the record, iterate_offset_ is
+    // already adjusted inside GetBlobRecord() in this case.
+    if (live || !status().ok()) return;


Just trying to learn, in what cases will status() be not ok? Do we need to set valid_ to false in that case?

For example, IO errors, In this case, we should not continue the loop.

As for whether setting valid_ to false, I don't think it is necessary, since valid() (the function not the variable) considers status_ already. So I didn't change the behavior (current code does not set valid_ either when encountering io errors).

v01dstar · 2024-09-07T05:54:09Z

src/db_impl.cc

  // If not, we can just skip the next round of purging obsolete files.
  db_->ReleaseSnapshot(snapshot);
+  {
+    MutexLock l(&mutex_);


It is not wise to acuqire a mutex here, since this is a hot path, almost all read requrests go through this (Titan creates a ManagedSnapshot implicitly). I will refactor this with atomic operations, but in a separate PR

Signed-off-by: v01dstar <[email protected]>

v01dstar force-pushed the punch-hole branch from ca9e932 to a7ec858 Compare August 10, 2024 09:58

v01dstar requested review from Connor1996 and zhangjinpeng87 August 10, 2024 09:59

Connor1996 reviewed Aug 19, 2024

View reviewed changes

src/blob_format.h Outdated Show resolved Hide resolved

Connor1996 reviewed Aug 19, 2024

View reviewed changes

include/titan/options.h Outdated Show resolved Hide resolved

Connor1996 reviewed Aug 19, 2024

View reviewed changes

LykxSassinator reviewed Aug 20, 2024

View reviewed changes

src/blob_file_builder.cc Show resolved Hide resolved

src/blob_file_size_collector.cc Outdated Show resolved Hide resolved

src/punch_hole_gc_job.cc Outdated Show resolved Hide resolved

hbisheng reviewed Aug 20, 2024

View reviewed changes

src/blob_gc_picker.cc Show resolved Hide resolved

src/blob_gc_picker.cc Outdated Show resolved Hide resolved

src/blob_storage.cc Show resolved Hide resolved

hbisheng reviewed Aug 21, 2024

View reviewed changes

src/db_impl.cc Outdated Show resolved Hide resolved

v01dstar force-pushed the punch-hole branch 2 times, most recently from 891fb24 to 1f0ecf1 Compare August 28, 2024 22:04

hbisheng reviewed Aug 29, 2024

View reviewed changes

v01dstar force-pushed the punch-hole branch from 1a81f69 to 63db741 Compare September 4, 2024 18:34

v01dstar commented Sep 7, 2024

View reviewed changes

v01dstar force-pushed the punch-hole branch from 5c1771d to 42650f4 Compare January 21, 2025 18:17

ti-chi-bot bot added the dco-signoff: yes label Jan 21, 2025

v01dstar added 7 commits January 21, 2025 10:55

Add punch hole GC

62c3eb0

Signed-off-by: v01dstar <[email protected]>

No longer persist effective_file_size

27d705e

Signed-off-by: v01dstar <[email protected]>

Cleaning up

5910044

Signed-off-by: v01dstar <[email protected]>

Fix test

000e8e6

Signed-off-by: v01dstar <[email protected]>

Fix test

bb4b55d

Signed-off-by: v01dstar <[email protected]>

Fix race condition

e9824e9

Signed-off-by: v01dstar <[email protected]>

Fix test

2a4e888

Signed-off-by: v01dstar <[email protected]>

v01dstar force-pushed the punch-hole branch from 42650f4 to 2a4e888 Compare January 21, 2025 19:01

v01dstar added 2 commits January 21, 2025 17:27

Revert some unnecessary changes

4b5e8ef

Signed-off-by: v01dstar <[email protected]>

Fix format

95cec9e

Signed-off-by: v01dstar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add punch hole GC #326

Add punch hole GC #326

Uh oh!

v01dstar commented Aug 10, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Connor1996 Aug 19, 2024

Uh oh!

v01dstar Aug 28, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hbisheng left a comment

Uh oh!

Uh oh!

hbisheng Aug 29, 2024

Uh oh!

v01dstar Aug 29, 2024

Uh oh!

v01dstar Sep 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add punch hole GC #326

Are you sure you want to change the base?

Add punch hole GC #326

Uh oh!

Conversation

v01dstar commented Aug 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Connor1996 Aug 19, 2024

Choose a reason for hiding this comment

Uh oh!

v01dstar Aug 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hbisheng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hbisheng Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

v01dstar Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

v01dstar Sep 7, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

v01dstar commented Aug 10, 2024 •

edited

Loading