Add H100 fused MoE config #25398

skyloevil · 2025-09-22T15:33:42Z

Purpose

add tuned config for E=128,N=384 on NVIDIA_H100_PCIe with [128,128] block shape so the fused MoE kernel can reuse benchmarked settings vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H100_PCIe,block_shape=[128,128].json

Test Plan

python benchmarks/kernels/benchmark_moe.py --tune --save-dir vllm/model_executor/layers/fused_moe/configs --model Qwen/Qwen3-30B-A3B-FP8 --tp-size 2

Signed-off-by: zitian.zhao <[email protected]>

gemini-code-assist

Code Review

This pull request adds a tuned configuration file for the fused MoE kernel on NVIDIA H100 GPUs. While the configuration values themselves seem reasonable, there is a critical issue with the filename. It appears to be missing the dtype=fp8_w8a8 specifier, which is necessary for vLLM to load this configuration for FP8 models at runtime. Without the correct filename, the system will fall back to default parameters, negating the performance benefits of this tuning.

gemini-code-assist · 2025-09-22T15:35:10Z

...layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H100_PCIe,block_shape=[128,128].json

@@ -0,0 +1,146 @@
+{


The filename for this configuration is missing the dtype specifier. According to the pull request description, this configuration is for an FP8 model. At runtime, vLLM constructs the configuration filename including a dtype part (e.g., dtype=fp8_w8a8 for FP8 models). Without this in the filename, this configuration file will not be found, and the fused MoE kernel will fall back to default, likely suboptimal, parameters.

To fix this, the file should be renamed to:
E=128,N=384,device_name=NVIDIA_H100_PCIe,dtype=fp8_w8a8,block_shape=[128,128].json

This likely occurred because the benchmark script was run without the --dtype fp8_w8a8 flag, which should be used when tuning for FP8 models.

samanamp · 2025-09-22T20:41:10Z

Would you share the before and after tuning for the kernels and model e2e execution? AFAIK for H100 the defaults are usually performant.

skyloevil · 2025-09-23T03:39:07Z

Would you share the before and after tuning for the kernels and model e2e execution? AFAIK for H100 the defaults are usually performant.

ok，we'll share later.

Add H100 fused MoE config

d8e4236

Signed-off-by: zitian.zhao <[email protected]>

skyloevil requested a review from mgoin as a code owner September 22, 2025 15:33

Merge branch 'main' into feature/h100-moe-config

c4e7b64

gemini-code-assist bot reviewed Sep 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add H100 fused MoE config #25398

Add H100 fused MoE config #25398

skyloevil commented Sep 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

samanamp commented Sep 22, 2025

Uh oh!

skyloevil commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

Add H100 fused MoE config #25398

Are you sure you want to change the base?

Add H100 fused MoE config #25398

Conversation

skyloevil commented Sep 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

samanamp commented Sep 22, 2025

Uh oh!

skyloevil commented Sep 23, 2025

Uh oh!

Uh oh!

skyloevil commented Sep 22, 2025 •

edited by github-actions bot

Loading