Skip to content

Conversation

Potabk
Copy link
Collaborator

@Potabk Potabk commented Sep 25, 2025

What this PR does / why we need it?

Enable multi thread load weight to speed up test term

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables multi-threaded weight loading in numerous E2E tests to speed up CI execution. While this is a beneficial performance improvement, the implementation introduces significant code duplication by hardcoding the model_loader_extra_config dictionary in many places. I've added a review comment with a suggestion to refactor this into a shared, centralized constant to improve code maintainability and reduce redundancy.

Comment on lines +30 to +33
model_loader_extra_config={
"enable_multithread_load": True,
"num_threads": 8
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This model_loader_extra_config dictionary is duplicated across at least 14 locations in 7 different files within this pull request. This widespread duplication creates a significant maintainability issue. For instance, changing the number of threads would require manually updating every occurrence.

To address this, I strongly recommend centralizing this configuration. You could define it as a constant in a shared module, such as tests/e2e/conftest.py, and then import it wherever it's needed.

Example of centralization:

# In a shared file like tests/e2e/conftest.py
import os

# Use half of the available CPUs, with a fallback for CI environments.
# This makes tests more portable.
NUM_LOADER_THREADS = (os.cpu_count() or 16) // 2

MULTI_THREAD_LOAD_CONFIG = {
    "enable_multithread_load": True,
    "num_threads": NUM_LOADER_THREADS,
}

Then, in each test, you would use:

# In your test file
from tests.e2e.conftest import VllmRunner, MULTI_THREAD_LOAD_CONFIG

# ...
with VllmRunner(
    # ...
    model_loader_extra_config=MULTI_THREAD_LOAD_CONFIG,
    # ...
)

This approach not only resolves the duplication but also makes the thread count more dynamic and suitable for different environments by using os.cpu_count().

@Potabk Potabk added ready read for review ready-for-test start test by label for PR labels Sep 25, 2025
@Potabk
Copy link
Collaborator Author

Potabk commented Sep 26, 2025

This mechanism has little effect, closed

@Potabk Potabk closed this Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:tests ready read for review ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant