Skip to content

Conversation

@LeandroTreu
Copy link
Collaborator

@LeandroTreu LeandroTreu commented Aug 11, 2025

Fixing the incorrect deallocation of client leases by propagating the lease ID to the process executor.

Summary by CodeRabbit

  • New Features
    • Lease responses now include allocated cores and memory.
    • Support requesting full available memory by specifying -1 in memory requests.
  • Bug Fixes
    • More reliable lease cleanup when executors stop, reducing resource leaks.
    • Added validation to prevent leases with non-positive cores or memory.
  • Improvements
    • Consistent logging now includes lease ID and cores for better traceability.
  • Refactor
    • Streamlined data flow to reuse lease response details and updated ID handling for executor checks.

@LeandroTreu LeandroTreu requested a review from mcopik August 11, 2025 15:18
@coderabbitai
Copy link

coderabbitai bot commented Aug 11, 2025

Walkthrough

Adds cores and memory fields to LeaseResponse, propagates them through client and resource manager paths, introduces lease_id awareness to executors, adjusts lease allocation to support memory = -1 (use full free memory), refactors client initialization to cache lease response, and updates executor disable flow to close leases earlier using stored lease_id.

Changes

Cohort / File(s) Summary
API structs
rfaas/include/rfaas/allocation.hpp
LeaseResponse gains int16_t cores and int32_t memory; comment tweak in LeaseRequest threshold description.
Client API usage
rfaas/include/rfaas/client.hpp
Caches LeaseResponse locally and uses its address/port/cores/memory/lease_id to init executor; removes repeated resource manager lookups.
Executor process and lifecycle
server/executor_manager/executor_process.hpp, server/executor_manager/executor_process.cpp, server/executor_manager/client.cpp
Adds lease_id to ProcessExecutor (constructor and member _lease_id); spawn passes lease.id; Client::disable now closes lease early using executor->_lease_id and timing data, removes later duplicate close.
Executor manager behavior/logging
server/executor_manager/manager.cpp
Logs now include lease id and cores; index variable renamed/type-adjusted to uint32_t client_id; cleans up commented code.
Resource manager leasing
server/resource_manager/db.cpp, server/resource_manager/executor.cpp
open_lease handles memory == -1 by using full free memory; populates LeaseResponse.cores and .memory; Executor::lease adds early validation for non-positive requests/free resources.
Resource manager client handling
server/resource_manager/manager.cpp
Send buffer now filled from client.response()[0].cores/memory; removes deferred removals batching, handles disconnects immediately.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ResourceManager
  participant ExecutorNode
  participant ExecutorManager
  participant ProcessExecutor

  Client->>ResourceManager: open_lease(request{cores, memory[-1 allowed]})
  ResourceManager->>ExecutorNode: find leaseable executor
  alt memory == -1
    ResourceManager->>ResourceManager: use executor free memory
  end
  ResourceManager-->>Client: LeaseResponse{address, port, lease_id, cores, memory}

  Client->>ExecutorManager: spawn executor with LeaseResponse
  ExecutorManager->>ProcessExecutor: construct(cores, t_alloc, pid, lease_id)
  ProcessExecutor-->>ExecutorManager: running

  Client->>ExecutorManager: disable()
  ExecutorManager->>ResourceManager: close_lease(lease_id, alloc_time, exec_time, hot_poll_time)
  ResourceManager-->>ExecutorManager: ack
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A rabbit taps keys with a caffeinated grin,
Leasing some cores, letting memory in.
A lease_id tucked in the process’s coat,
Close it up early—add a tidy note.
From -1 to full, the buffers now sing—
Hop-hop, tidy paths for every thing. 🐇💾⚙️

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/lease-deallocation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
server/executor_manager/executor_process.hpp (1)

35-41: Initialize _lease_id in ActiveExecutor to a safe default.

Set _lease_id to -1 in ActiveExecutor’s ctor to avoid accidental use-before-set in other derived classes (e.g., DockerExecutor).

 struct ActiveExecutor {
   ...
-  ActiveExecutor(int cores):
+  ActiveExecutor(int cores):
     connections(new rdmalib::Connection*[cores]),
     connections_len(0),
-    cores(cores)
+    cores(cores)
   {}
+  // Also: consider adding _lease_id{-1} either here or via in-class initializer.

Alternatively, add an in-class initializer: int32_t _lease_id{-1};

server/resource_manager/db.cpp (1)

78-94: Bug: memory sentinel (-1) handling leaks across candidates; also ensure address is null-terminated.

When memory == -1, you overwrite memory with the first candidate’s free memory. If that candidate can’t lease, subsequent candidates inherit this value, leading to under-/over-allocation. Compute a per-candidate memory_to_lease instead. Also, strncpy doesn’t guarantee null-termination.

Apply this refactor:

-      // Lease entire free memory when receiving reserved value -1
-      if (memory == -1) {
-        memory = shared_ptr->_free_memory;
-      }
-
-      if(!shared_ptr->lease(numcores, memory)) {
+      // Determine memory to lease per candidate (do not mutate the original request)
+      int memory_to_lease = (memory == -1) ? shared_ptr->_free_memory : memory;
+
+      if(!shared_ptr->lease(numcores, memory_to_lease)) {
         ++it;
         SPDLOG_DEBUG("Node {} cannot be used, not enough resources!", shared_ptr->node);
         continue;
       }
@@
-      strncpy(lease.address, shared_ptr->address.c_str(), Executor::ADDRESS_LENGTH);
-      lease.cores = numcores;
-      lease.memory = memory;
+      strncpy(lease.address, shared_ptr->address.c_str(), Executor::ADDRESS_LENGTH);
+      lease.address[Executor::ADDRESS_LENGTH - 1] = '\0';
+      lease.cores = numcores;
+      lease.memory = memory_to_lease;
@@
-        std::forward_as_tuple(numcores, memory, is_total, std::move(ptr))
+        std::forward_as_tuple(numcores, memory_to_lease, is_total, std::move(ptr))

This fixes the cross-candidate leakage and prevents potential UB on the client when consuming address.

🧹 Nitpick comments (4)
server/resource_manager/executor.cpp (1)

51-53: Early guard is fine; ensure upstream sentinel handling remains in db.cpp.

The early return keeps lease() robust and prevents negative/zero requests. Since memory == -1 is normalized in db.cpp before calling lease(), this won’t block “full free memory” leases. Looks good.

If you prefer tighter code, the later < checks already cover the free* <= 0 cases implicitly; consider removing the free* <= 0 part to avoid redundancy.

server/executor_manager/executor_process.hpp (1)

53-53: Prefer initializing _lease_id in the initializer list.

Mirror the new parameter in the init-list in the .cpp to avoid default-init then assignment.

server/executor_manager/executor_process.cpp (1)

33-42: Initialize _lease_id in the initializer list, not in the body.

Small cleanup to avoid double-initialization and keep style consistent.

-ProcessExecutor::ProcessExecutor(int cores, ProcessExecutor::time_t alloc_begin, pid_t pid, int32_t lease_id):
-  ActiveExecutor(cores),
-  _pid(pid)
+ProcessExecutor::ProcessExecutor(int cores, ProcessExecutor::time_t alloc_begin, pid_t pid, int32_t lease_id):
+  ActiveExecutor(cores),
+  _pid(pid),
+  _lease_id(lease_id)
 {
   _allocation_begin = alloc_begin;
   // FIXME: remove after connection
   _allocation_finished = _allocation_begin;
-
-  _lease_id = lease_id;
 }
server/executor_manager/client.cpp (1)

99-112: Correct lease deallocation using executor->_lease_id; minor resilience suggestions.

Propagating lease_id and closing it here fixes prior deallocation mismatch. Sequence and logging are sound. Two small suggestions:

  • After successful close_lease, consider setting executor->_lease_id = -1 (defensive), though you reset executor immediately after.
  • If close_lease fails (as logged inside the call), you might want a retry/backoff in future work.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d57ce0 and fd72348.

📒 Files selected for processing (9)
  • rfaas/include/rfaas/allocation.hpp (2 hunks)
  • rfaas/include/rfaas/client.hpp (1 hunks)
  • server/executor_manager/client.cpp (1 hunks)
  • server/executor_manager/executor_process.cpp (2 hunks)
  • server/executor_manager/executor_process.hpp (2 hunks)
  • server/executor_manager/manager.cpp (3 hunks)
  • server/resource_manager/db.cpp (2 hunks)
  • server/resource_manager/executor.cpp (1 hunks)
  • server/resource_manager/manager.cpp (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (7)
server/resource_manager/executor.cpp (2)
rfaas/include/rfaas/client.hpp (2)
  • cores (33-67)
  • cores (33-33)
server/resource_manager/executor.hpp (1)
  • cores (56-56)
server/executor_manager/client.cpp (4)
server/executor_manager/manager.hpp (2)
  • lease_id (76-90)
  • lease_id (76-76)
rfaas/include/rfaas/executor.hpp (3)
  • executor (76-76)
  • executor (77-77)
  • executor (79-79)
rfaas/lib/executor.cpp (3)
  • executor (40-66)
  • executor (68-71)
  • executor (73-93)
server/executor_manager/client.hpp (2)
  • res_mgr_connection (39-39)
  • _id (42-45)
server/resource_manager/manager.cpp (3)
rfaas/include/rfaas/client.hpp (1)
  • client (18-21)
server/resource_manager/manager.hpp (1)
  • client (108-108)
server/executor_manager/manager.hpp (1)
  • client (169-169)
server/executor_manager/executor_process.cpp (2)
server/executor_manager/executor_process.hpp (1)
  • ProcessExecutor (53-53)
server/executor_manager/manager.hpp (2)
  • lease_id (76-90)
  • lease_id (76-76)
server/executor_manager/executor_process.hpp (1)
server/executor_manager/manager.hpp (2)
  • lease_id (76-90)
  • lease_id (76-76)
server/resource_manager/db.cpp (2)
server/resource_manager/executor.cpp (2)
  • lease (49-67)
  • lease (49-49)
server/resource_manager/db.hpp (1)
  • numcores (61-61)
server/executor_manager/manager.cpp (3)
server/resource_manager/manager.hpp (1)
  • client (108-108)
server/resource_manager/executor.cpp (2)
  • lease (49-67)
  • lease (49-49)
server/resource_manager/executor.hpp (1)
  • lease (58-58)
🔇 Additional comments (7)
rfaas/include/rfaas/allocation.hpp (2)

11-11: Doc fix matches behavior.

The “<= 0” deallocation/disconnect comment aligns with server logic. No further action.


28-30: Adding fields to wire struct: verify layout/stability and null-termination for address.

Since LeaseResponse is RDMA-copied and reinterpreted, adding fields changes the wire size. All producers/consumers in this PR appear updated, but please:

  • Add a static_assert on sizeof(LeaseResponse) and consider explicit packing/alignment if cross-compiler/arch compatibility is needed.
  • Ensure the producer side always null-terminates address (strncpy doesn’t guarantee it when the source fits exactly). See db.cpp suggestions to set address[15] = '\0'.

I can help add compile-time checks (static_assert and offsetof) to guard the layout if desired.

server/executor_manager/executor_process.cpp (1)

224-225: Correct propagation of lease_id to the process.

Passing lease.id into ProcessExecutor ensures proper lease ownership for deallocation. LGTM.

server/resource_manager/manager.cpp (1)

325-327: Aligns executor message with actual allocation.

Using response().cores/memory ensures the executor receives the definitive allocation (esp. when memory = -1 was expanded). Good change.

server/executor_manager/manager.cpp (3)

314-315: LGTM! Enhanced logging improves lease observability.

The addition of lease ID and cores information to the executor creation log will help with debugging and monitoring lease allocations. The format string and variable access are correct.


342-342: LGTM! Improved type safety and variable naming.

The change from int i to uint32_t client_id improves type consistency with the client ID type and makes the variable name more descriptive.


363-365: LGTM! Consistent use of the renamed variable.

The logging and accounting data access correctly use the renamed client_id variable, maintaining consistency throughout the method.

Comment on lines +57 to 65
const LeaseResponse& lease_response = _resource_mgr.response(response_id);

return rfaas::executor{
std::string{_resource_mgr.response(response_id).address},
_resource_mgr.response(response_id).port,
cores,
memory,
_resource_mgr.response(response_id).lease_id,
std::string{lease_response.address},
lease_response.port,
lease_response.cores,
lease_response.memory,
lease_response.lease_id,
dev
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential UB: unsafe std::string construction from fixed-size char buffer.

lease_response.address is filled with strncpy on the server; it’s not guaranteed to be null-terminated. Constructing std::string from char* can read past bounds.

Use a bounded constructor:

+#include <cstring> // for strnlen
 ...
-      return rfaas::executor{
-        std::string{lease_response.address},
+      return rfaas::executor{
+        std::string{
+          lease_response.address,
+          strnlen(lease_response.address, sizeof(lease_response.address))
+        },
         lease_response.port,
         lease_response.cores,
         lease_response.memory,
         lease_response.lease_id,
         dev
       };

Also ensure the producer side explicitly null-terminates address (see db.cpp review).

🤖 Prompt for AI Agents
In rfaas/include/rfaas/client.hpp around lines 57 to 65, constructing
std::string from lease_response.address (a fixed-size char buffer filled with
strncpy) is unsafe because it may not be null-terminated; change the
construction to use the bounded std::string constructor with an explicit length
(e.g., std::string(lease_response.address, strnlen(lease_response.address,
ADDRESS_MAX))) or pass the known buffer size trimmed by strnlen, and also ensure
the producer side (server/db.cpp) explicitly null-terminates address after
strncpy to prevent non-terminated buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant