Skip to content

Conversation

jeejeelee
Copy link
Collaborator

@jeejeelee jeejeelee commented Sep 22, 2025

Purpose

Gemmi's description is very detailed and can slightly reduce the loading time for LoRA weights.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes LoRA weight loading by changing the memory layout of LoRA tensors to avoid costly transpose operations. The changes are consistently applied across all relevant layers and utility functions. The new convention for lora_a is (rank, input_dim) and for lora_b is (output_dim, rank), which matches how they are often stored in checkpoints, thus removing the need for transposition during loading. The slicing and copying logic has been updated accordingly. The changes are correct and contribute to better performance. I have no high or critical severity comments.

Signed-off-by: Jee Jee Li <[email protected]>
@jeejeelee jeejeelee force-pushed the optimize-lora-loading branch from 0b49a76 to 4fc3209 Compare September 22, 2025 16:16
@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 22, 2025
Signed-off-by: Jee Jee Li <[email protected]>
@jeejeelee jeejeelee force-pushed the optimize-lora-loading branch from 82bc9e6 to 8f1b7b7 Compare September 23, 2025 02:14
@jeejeelee jeejeelee force-pushed the optimize-lora-loading branch from 096aaa9 to af92bb2 Compare September 23, 2025 06:41
Signed-off-by: Jee Jee Li <[email protected]>
Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Isotr0py Isotr0py merged commit 273690a into vllm-project:main Sep 23, 2025
45 checks passed
@jeejeelee jeejeelee deleted the optimize-lora-loading branch September 23, 2025 12:37
namanlalitnyu pushed a commit to namanlalitnyu/vllm that referenced this pull request Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants