Skip to content

Commit fa2fb74

Browse files
committed
[Qwen3-Coder-480B-A35B] update max-model-len for fp8
Signed-off-by: Abirdcfly <[email protected]>
1 parent 807c5e7 commit fa2fb74

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

Qwen/Qwen3-Coder-480B-A35B.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct \
2828

2929
```bash
3030
VLLM_USE_DEEP_GEMM=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
31+
--max-model-len 131072 \
3132
--enable-expert-parallel \
3233
--data-parallel-size 8 \
3334
--enable-auto-tool-choice \
@@ -91,9 +92,10 @@ P99 ITL (ms): 69.38
9192
## Using Tips
9293

9394
### BF16 Models
94-
- **Context Length Limitation**: A single H20 node cannot serve the orgional context length(262144). You can reduce the `max-model-len` to work within memory constraints.
95+
- **Context Length Limitation**: A single H20 node cannot serve the orgional context length(262144). You can reduce the `max-model-len` or increase `gpu-memory-utilization` to work within memory constraints.
9596

9697
### FP8 Models
98+
- **Context Length Limitation**: A single H20 node cannot serve the orgional context length(262144). You can reduce the `max-model-len` or increase `gpu-memory-utilization` to work within memory constraints.
9799
- **DeepGEMM Usage**: To use [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM), set `VLLM_USE_DEEP_GEMM=1`. Follow the [setup instructions](https://github.com/vllm-project/vllm/blob/main/benchmarks/kernels/deepgemm/README.md#setup) to install it.
98100
- **Tensor Parallelism Issue**: When using `tensor-parallel-size 8`, the following failures are expected. Switch to data-parallel mode using `--data-parallel-size`.
99101
- **Additional Resources**: Refer to the [Data Parallel Deployment documentation](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment.html) for more parallelism groups.

0 commit comments

Comments
 (0)