-
Notifications
You must be signed in to change notification settings - Fork 18
Lease Deallocation Fix #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds cores and memory fields to LeaseResponse, propagates them through client and resource manager paths, introduces lease_id awareness to executors, adjusts lease allocation to support memory = -1 (use full free memory), refactors client initialization to cache lease response, and updates executor disable flow to close leases earlier using stored lease_id. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ResourceManager
participant ExecutorNode
participant ExecutorManager
participant ProcessExecutor
Client->>ResourceManager: open_lease(request{cores, memory[-1 allowed]})
ResourceManager->>ExecutorNode: find leaseable executor
alt memory == -1
ResourceManager->>ResourceManager: use executor free memory
end
ResourceManager-->>Client: LeaseResponse{address, port, lease_id, cores, memory}
Client->>ExecutorManager: spawn executor with LeaseResponse
ExecutorManager->>ProcessExecutor: construct(cores, t_alloc, pid, lease_id)
ProcessExecutor-->>ExecutorManager: running
Client->>ExecutorManager: disable()
ExecutorManager->>ResourceManager: close_lease(lease_id, alloc_time, exec_time, hot_poll_time)
ResourceManager-->>ExecutorManager: ack
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
server/executor_manager/executor_process.hpp (1)
35-41: Initialize _lease_id in ActiveExecutor to a safe default.Set _lease_id to -1 in ActiveExecutor’s ctor to avoid accidental use-before-set in other derived classes (e.g., DockerExecutor).
struct ActiveExecutor { ... - ActiveExecutor(int cores): + ActiveExecutor(int cores): connections(new rdmalib::Connection*[cores]), connections_len(0), - cores(cores) + cores(cores) {} + // Also: consider adding _lease_id{-1} either here or via in-class initializer.Alternatively, add an in-class initializer: int32_t _lease_id{-1};
server/resource_manager/db.cpp (1)
78-94: Bug: memory sentinel (-1) handling leaks across candidates; also ensure address is null-terminated.When memory == -1, you overwrite memory with the first candidate’s free memory. If that candidate can’t lease, subsequent candidates inherit this value, leading to under-/over-allocation. Compute a per-candidate memory_to_lease instead. Also, strncpy doesn’t guarantee null-termination.
Apply this refactor:
- // Lease entire free memory when receiving reserved value -1 - if (memory == -1) { - memory = shared_ptr->_free_memory; - } - - if(!shared_ptr->lease(numcores, memory)) { + // Determine memory to lease per candidate (do not mutate the original request) + int memory_to_lease = (memory == -1) ? shared_ptr->_free_memory : memory; + + if(!shared_ptr->lease(numcores, memory_to_lease)) { ++it; SPDLOG_DEBUG("Node {} cannot be used, not enough resources!", shared_ptr->node); continue; } @@ - strncpy(lease.address, shared_ptr->address.c_str(), Executor::ADDRESS_LENGTH); - lease.cores = numcores; - lease.memory = memory; + strncpy(lease.address, shared_ptr->address.c_str(), Executor::ADDRESS_LENGTH); + lease.address[Executor::ADDRESS_LENGTH - 1] = '\0'; + lease.cores = numcores; + lease.memory = memory_to_lease; @@ - std::forward_as_tuple(numcores, memory, is_total, std::move(ptr)) + std::forward_as_tuple(numcores, memory_to_lease, is_total, std::move(ptr))This fixes the cross-candidate leakage and prevents potential UB on the client when consuming address.
🧹 Nitpick comments (4)
server/resource_manager/executor.cpp (1)
51-53: Early guard is fine; ensure upstream sentinel handling remains in db.cpp.The early return keeps lease() robust and prevents negative/zero requests. Since memory == -1 is normalized in db.cpp before calling lease(), this won’t block “full free memory” leases. Looks good.
If you prefer tighter code, the later < checks already cover the free* <= 0 cases implicitly; consider removing the free* <= 0 part to avoid redundancy.
server/executor_manager/executor_process.hpp (1)
53-53: Prefer initializing _lease_id in the initializer list.Mirror the new parameter in the init-list in the .cpp to avoid default-init then assignment.
server/executor_manager/executor_process.cpp (1)
33-42: Initialize _lease_id in the initializer list, not in the body.Small cleanup to avoid double-initialization and keep style consistent.
-ProcessExecutor::ProcessExecutor(int cores, ProcessExecutor::time_t alloc_begin, pid_t pid, int32_t lease_id): - ActiveExecutor(cores), - _pid(pid) +ProcessExecutor::ProcessExecutor(int cores, ProcessExecutor::time_t alloc_begin, pid_t pid, int32_t lease_id): + ActiveExecutor(cores), + _pid(pid), + _lease_id(lease_id) { _allocation_begin = alloc_begin; // FIXME: remove after connection _allocation_finished = _allocation_begin; - - _lease_id = lease_id; }server/executor_manager/client.cpp (1)
99-112: Correct lease deallocation using executor->_lease_id; minor resilience suggestions.Propagating lease_id and closing it here fixes prior deallocation mismatch. Sequence and logging are sound. Two small suggestions:
- After successful close_lease, consider setting executor->_lease_id = -1 (defensive), though you reset executor immediately after.
- If close_lease fails (as logged inside the call), you might want a retry/backoff in future work.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
rfaas/include/rfaas/allocation.hpp(2 hunks)rfaas/include/rfaas/client.hpp(1 hunks)server/executor_manager/client.cpp(1 hunks)server/executor_manager/executor_process.cpp(2 hunks)server/executor_manager/executor_process.hpp(2 hunks)server/executor_manager/manager.cpp(3 hunks)server/resource_manager/db.cpp(2 hunks)server/resource_manager/executor.cpp(1 hunks)server/resource_manager/manager.cpp(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (7)
server/resource_manager/executor.cpp (2)
rfaas/include/rfaas/client.hpp (2)
cores(33-67)cores(33-33)server/resource_manager/executor.hpp (1)
cores(56-56)
server/executor_manager/client.cpp (4)
server/executor_manager/manager.hpp (2)
lease_id(76-90)lease_id(76-76)rfaas/include/rfaas/executor.hpp (3)
executor(76-76)executor(77-77)executor(79-79)rfaas/lib/executor.cpp (3)
executor(40-66)executor(68-71)executor(73-93)server/executor_manager/client.hpp (2)
res_mgr_connection(39-39)_id(42-45)
server/resource_manager/manager.cpp (3)
rfaas/include/rfaas/client.hpp (1)
client(18-21)server/resource_manager/manager.hpp (1)
client(108-108)server/executor_manager/manager.hpp (1)
client(169-169)
server/executor_manager/executor_process.cpp (2)
server/executor_manager/executor_process.hpp (1)
ProcessExecutor(53-53)server/executor_manager/manager.hpp (2)
lease_id(76-90)lease_id(76-76)
server/executor_manager/executor_process.hpp (1)
server/executor_manager/manager.hpp (2)
lease_id(76-90)lease_id(76-76)
server/resource_manager/db.cpp (2)
server/resource_manager/executor.cpp (2)
lease(49-67)lease(49-49)server/resource_manager/db.hpp (1)
numcores(61-61)
server/executor_manager/manager.cpp (3)
server/resource_manager/manager.hpp (1)
client(108-108)server/resource_manager/executor.cpp (2)
lease(49-67)lease(49-49)server/resource_manager/executor.hpp (1)
lease(58-58)
🔇 Additional comments (7)
rfaas/include/rfaas/allocation.hpp (2)
11-11: Doc fix matches behavior.The “<= 0” deallocation/disconnect comment aligns with server logic. No further action.
28-30: Adding fields to wire struct: verify layout/stability and null-termination for address.Since LeaseResponse is RDMA-copied and reinterpreted, adding fields changes the wire size. All producers/consumers in this PR appear updated, but please:
- Add a static_assert on sizeof(LeaseResponse) and consider explicit packing/alignment if cross-compiler/arch compatibility is needed.
- Ensure the producer side always null-terminates address (strncpy doesn’t guarantee it when the source fits exactly). See db.cpp suggestions to set address[15] = '\0'.
I can help add compile-time checks (static_assert and offsetof) to guard the layout if desired.
server/executor_manager/executor_process.cpp (1)
224-225: Correct propagation of lease_id to the process.Passing lease.id into ProcessExecutor ensures proper lease ownership for deallocation. LGTM.
server/resource_manager/manager.cpp (1)
325-327: Aligns executor message with actual allocation.Using response().cores/memory ensures the executor receives the definitive allocation (esp. when memory = -1 was expanded). Good change.
server/executor_manager/manager.cpp (3)
314-315: LGTM! Enhanced logging improves lease observability.The addition of lease ID and cores information to the executor creation log will help with debugging and monitoring lease allocations. The format string and variable access are correct.
342-342: LGTM! Improved type safety and variable naming.The change from
int itouint32_t client_idimproves type consistency with the client ID type and makes the variable name more descriptive.
363-365: LGTM! Consistent use of the renamed variable.The logging and accounting data access correctly use the renamed
client_idvariable, maintaining consistency throughout the method.
| const LeaseResponse& lease_response = _resource_mgr.response(response_id); | ||
|
|
||
| return rfaas::executor{ | ||
| std::string{_resource_mgr.response(response_id).address}, | ||
| _resource_mgr.response(response_id).port, | ||
| cores, | ||
| memory, | ||
| _resource_mgr.response(response_id).lease_id, | ||
| std::string{lease_response.address}, | ||
| lease_response.port, | ||
| lease_response.cores, | ||
| lease_response.memory, | ||
| lease_response.lease_id, | ||
| dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential UB: unsafe std::string construction from fixed-size char buffer.
lease_response.address is filled with strncpy on the server; it’s not guaranteed to be null-terminated. Constructing std::string from char* can read past bounds.
Use a bounded constructor:
+#include <cstring> // for strnlen
...
- return rfaas::executor{
- std::string{lease_response.address},
+ return rfaas::executor{
+ std::string{
+ lease_response.address,
+ strnlen(lease_response.address, sizeof(lease_response.address))
+ },
lease_response.port,
lease_response.cores,
lease_response.memory,
lease_response.lease_id,
dev
};Also ensure the producer side explicitly null-terminates address (see db.cpp review).
🤖 Prompt for AI Agents
In rfaas/include/rfaas/client.hpp around lines 57 to 65, constructing
std::string from lease_response.address (a fixed-size char buffer filled with
strncpy) is unsafe because it may not be null-terminated; change the
construction to use the bounded std::string constructor with an explicit length
(e.g., std::string(lease_response.address, strnlen(lease_response.address,
ADDRESS_MAX))) or pass the known buffer size trimmed by strnlen, and also ensure
the producer side (server/db.cpp) explicitly null-terminates address after
strncpy to prevent non-terminated buffers.
Fixing the incorrect deallocation of client leases by propagating the lease ID to the process executor.
Summary by CodeRabbit