[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable #25651

ZJY0516 · 2025-09-25T08:24:14Z

Purpose

FIX #25612
CC @ProExpertProg

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zjy0516 <[email protected]>

mergify · 2025-09-25T08:24:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ZJY0516.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request introduces an environment variable VLLM_DEBUG_DUMP_PATH to allow users to specify a directory for dumping debug information, which can be helpful for debugging and analysis. The changes involve adding the environment variable definition in vllm/envs.py and utilizing it in vllm/config/__init__.py to set the debug_dump_path in the compilation configuration.

vllm/config/__init__.py

Signed-off-by: zjy0516 <[email protected]>

ProExpertProg

Thanks for the work, please append rank to path in config init, also add overriding behavior to config docstring and change type of property to Path

vllm/config/__init__.py

vllm/envs.py

ProExpertProg · 2025-09-25T12:03:42Z

vllm/config/__init__.py

                    self.scheduler_config.disable_hybrid_kv_cache_manager = True

+        if envs.VLLM_DEBUG_DUMP_PATH is not None:
+            self.compilation_config.debug_dump_path = \


Here we should warn if the compilation config property is set already (because it means we're overriding)

Also, after the override, we should append rank to the path here so that we don't have to append it everywhere else.

We cannot get rank here because process group has not been initialized

In that case, could you add a helper function VllmConfig.compile_debug_dump_path() that computes the path using the rank?

I dont think it's appropriate to put compile_debug_dump_path() in VllmConfig . What do you think?

I think it's better than recomputing it in this many places. You can also add a method to CompilationConfig that accepts vllm_config as a param if that seems better to you

Does Pydantic not auto cast str to Path on initialisation here? The cast to Path might be redundant

Does Pydantic not auto cast str to Path on initialisation here? The cast to Path might be redundant

You are right. Thanks for your reminding.

Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Jiangyun Zhu <[email protected]>

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 · 2025-09-25T14:19:40Z

Currently, debug_dump_path is typed and initialized as a str. I'm hesitant to convert it to a pathlib.Path within __post_init__

ProExpertProg · 2025-09-25T14:35:26Z

Currently, debug_dump_path is typed and initialized as a str. I'm hesitant to convert it to a pathlib.Path within __post_init__

Why not an Optional[pathlib.Path]?

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 · 2025-09-26T06:32:14Z

I think this is ready. PTAL @ProExpertProg

ProExpertProg

I am still more of a fan of VllmConfig.compile_debug_dump_path() - that way we can simplify this code tremendously which is the whole goal. There is no place where we only have access to CompilationConfig and not VllmConfig so I think we should go with that approach

ProExpertProg · 2025-09-26T12:13:53Z

vllm/compilation/monitor.py

-        path = os.path.join(compilation_config.debug_dump_path,
-                            f"rank_{vllm_config.parallel_config.rank}")
+        rank = torch.distributed.get_rank()
+        path = compilation_config.compile_debug_dump_path(rank)


If we're passing rank anyway, wy not use ParallelConfig.rank?

Because ParallelConfig.rank are always zero at that time

But this is successfully used in main, I think ParallelConfig.rank is initialized by this point

If we use ParallelConfig.rank like main, dp mode will always get 0

vllm serve /data/datasets/models-hf/Qwen3-4B/ -dp 2 -O.debug_dump_path "./compile_dump" ls compile_dump/ rank_0/ vllm serve /data/datasets/models-hf/Qwen3-4B/ -tp 2 -O.debug_dump_path "./compile_dump" ls compile_dump/ rank_0/ rank_1/

I see, nice find. In think case could we just append dp rank separately?

You mean like rank_0_dp_0 ?

Yep! But only if dp world size is > 1

ProExpertProg · 2025-09-26T12:19:05Z

vllm/config/compilation.py

                logger.warning_once("Op '%s' %s, %s with '%s' has no effect",
                                    op_name, missing_str, enable_str, op)
+
+    def compile_debug_dump_path(self, rank: int) -> Path:


This should return Optional[Path] and return None if the path is not set rather than assert so that it can be used in places we have to check whether a path exists or not

Actually, this only gets called if debug_dump_path is not None.

Yeah but that makes the code more complex

Current code consistently checks if debug_dump_path is set (not None) before proceeding. We skip the operation if it's None.

if not debug_dump_path: return debug_dump_path = compile_debug_dump_path( rank) debug_dump_path.mkdir(parents=True, exist_ok=True)

if compilation_config.level == CompilationLevel.PIECEWISE and \ compilation_config.debug_dump_path: import depyf rank = torch.distributed.get_rank() path = vllm_config.compile_debug_dump_path(rank) path.mkdir(parents=True, exist_ok=True)

Returning Optional[Path] contradicts this pattern. If we did, the type checker would rightly flag that mkdir is not a valid method for a None object whenever we use the path without a prior null check.

Yes but this means that we first check debug_dump_path, and if not none we call compile_debug_dump_path. That means users need to know about both. Instead users should just call compile_debug_dump_path and THEN do the None check, and never interact with debug_dump_path directly.

ZJY0516 · 2025-09-26T12:51:52Z

I am still more of a fan of VllmConfig.compile_debug_dump_path() - that way we can simplify this code tremendously which is the whole goal. There is no place where we only have access to CompilationConfig and not VllmConfig so I think we should go with that approach

I'm okay with this change, but I'm concerned that adding too many functions directly to VllmConfig could lead to a bloated class that's difficult to maintain.

ProExpertProg · 2025-09-26T15:44:32Z

I'm okay with this change, but I'm concerned that adding too many functions directly to VllmConfig could lead to a bloated class that's difficult to maintain.

If that happens we can always move the function out of VllmConfig and pass VllmConfig as a parameter. For now, I think it makes sense because it's basically a "computed property".

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 · 2025-09-27T07:41:41Z

I'm okay with this change, but I'm concerned that adding too many functions directly to VllmConfig could lead to a bloated class that's difficult to maintain.

If that happens we can always move the function out of VllmConfig and pass VllmConfig as a parameter. For now, I think it makes sense because it's basically a "computed property".

I have moved it to VllmConfig

Signed-off-by: zjy0516 <[email protected]>

ProExpertProg

A few minor notes, thanks for this cleanup!

vllm/config/__init__.py

Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Jiangyun Zhu <[email protected]>

Signed-off-by: zjy0516 <[email protected]>

use env

6ba6a95

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 requested review from simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad, hmellor, yewentao256 and ProExpertProg as code owners September 25, 2025 08:24

mergify bot added the needs-rebase label Sep 25, 2025

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

vllm/config/__init__.py Outdated Show resolved Hide resolved

Merge branch 'main' into debug_dump

f06a349

Signed-off-by: zjy0516 <[email protected]>

mergify bot removed the needs-rebase label Sep 25, 2025

use pathlib

b1175dd

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 requested a review from zou3519 as a code owner September 25, 2025 08:40

ProExpertProg reviewed Sep 25, 2025

View reviewed changes

Update vllm/envs.py

768591b

Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Jiangyun Zhu <[email protected]>

ProExpertProg added the torch.compile label Sep 25, 2025

github-project-automation bot added this to torch.compile integration Sep 25, 2025

github-project-automation bot moved this to To triage in torch.compile integration Sep 25, 2025

ProExpertProg moved this from To triage to In progress in torch.compile integration Sep 25, 2025

fix rank

433664b

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 requested a review from ProExpertProg September 25, 2025 14:19

use Optional[pathlib.Path]

6d52374

Signed-off-by: zjy0516 <[email protected]>

ProExpertProg mentioned this pull request Sep 25, 2025

[RFC]: Limit the use of envvars in vLLM #25700

Open

1 task

ProExpertProg moved this from In progress to In review in torch.compile integration Sep 25, 2025

add compile_debug_dump_path in CompilationConfig

09f779e

Signed-off-by: zjy0516 <[email protected]>

ProExpertProg reviewed Sep 26, 2025

View reviewed changes

change to VllmConfig.compile_debug_dump_path

1239b80

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 added 3 commits September 27, 2025 21:34

fix

f77d72e

Signed-off-by: zjy0516 <[email protected]>

Merge branch 'main' into debug_dump

0ea909e

fix

b60e913

Signed-off-by: zjy0516 <[email protected]>

ProExpertProg approved these changes Sep 27, 2025

View reviewed changes

vllm/config/__init__.py Outdated Show resolved Hide resolved

vllm/config/__init__.py Outdated Show resolved Hide resolved

vllm/config/__init__.py Outdated Show resolved Hide resolved

vllm/config/__init__.py Outdated Show resolved Hide resolved

ZJY0516 and others added 4 commits September 27, 2025 22:11

Apply suggestion from @ProExpertProg

b4fb9d3

Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Jiangyun Zhu <[email protected]>

Apply suggestion from @ProExpertProg

f9ca383

Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Jiangyun Zhu <[email protected]>

Apply suggestion from @ProExpertProg

c67c593

Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Jiangyun Zhu <[email protected]>

fix format

9d86199

Signed-off-by: zjy0516 <[email protected]>

ProExpertProg approved these changes Sep 27, 2025

View reviewed changes

ProExpertProg enabled auto-merge (squash) September 27, 2025 14:24

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 27, 2025

ProExpertProg merged commit c0ec818 into vllm-project:main Sep 27, 2025
42 checks passed

github-project-automation bot moved this from In review to Done in torch.compile integration Sep 27, 2025

Uh oh!

[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable #25651

[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable #25651

Conversation

ZJY0516 commented Sep 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Sep 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZJY0516 commented Sep 25, 2025

Uh oh!

ProExpertProg commented Sep 25, 2025

Uh oh!

ZJY0516 commented Sep 26, 2025

Uh oh!

ProExpertProg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZJY0516 commented Sep 26, 2025

Uh oh!

ProExpertProg commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZJY0516 commented Sep 27, 2025 • edited by ProExpertProg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZJY0516 commented Sep 25, 2025 •

edited by github-actions bot

Loading

ProExpertProg left a comment •

edited

Loading

ProExpertProg commented Sep 26, 2025 •

edited

Loading

ZJY0516 commented Sep 27, 2025 •

edited by ProExpertProg

Loading