-
Notifications
You must be signed in to change notification settings - Fork 709
Description
Backgroud
There are some long standing data corruption issues, which indicate that there might be potential bug(s) in freelist management.
- after restart, bbolt db failed to get all reachable pages #778
- Branch page items link to already released pages #402
- Panic happens when opening a boltdb #705
- ...
I am not satisfied with the bbolt freelist management for a long time. It's hard to understand and also tightly coupled with bbolt. I have been thinking to refactor it to improve the understandability & testability.
Refactor
The high level idea is to
- simplify the implementation to improve understandability;
- and introduce interface and decouple it with the bbolt TXN workflow to improve testability.
What we have done and are going to do:
- move array related freelist functions into own file #777
- Move method freePages into freelist.go #783
- introduce a freelist interface #775
- No need to handle freelist as a specical case when freeing a page #788
- Panicking when a write txn tries to free a page which was allocated by itself #792
- Simplify & update the logic related to the pending released free pages, refer to doc
We also need to continue to refactor & simplify the interface from user (bbolt) perspective, in other words, we should have a clear understanding on how the interface will & should be used by bbolt. The motivation is to improve testablity.
The freelist management is the most sensitive & important part. So let's do it step by step.
Test
Any unit tests are welcome.
But more importantly, we should add dedicated randomized test case to simulate concurrent (multiple) read TXNs and (single) writing TXN.
- We need to record all the requests or operations sent to the freelist module;
- We need to set up a list of expected behavior or invariable properties, and verify all the invariable properties are not broken during the test.