Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
953 commits
Select commit Hold shift + click to select a range
14c1432
[BugFix] Fix async scheduling CPU tensor race take 2 (#25279)
njhill Sep 19, 2025
3da17c2
[Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE #2969 (#25090)
Lucaskabela Sep 20, 2025
a36c675
Don't skip special tokens with hermes-style tool calling (#25281)
maxdebayser Sep 20, 2025
c7e7136
test: Remove vestigial skip for prompt embeds tests after landing v1 …
qthequartermasterman Sep 20, 2025
b8a287a
[docs] Prompt Embedding feature support (#25288)
qthequartermasterman Sep 20, 2025
8945b00
[torch.compile] CUDAGraph Inductor partition integration (#24281)
BoyuanFeng Sep 20, 2025
a25ade5
[BugFix] Ensure appropriate guards in destructors (#25284)
njhill Sep 20, 2025
535d800
[Misc] Support more collective_rpc return types (#25294)
njhill Sep 20, 2025
c308501
Improve weight loading for encoder models in Transformers backend (#2…
hmellor Sep 20, 2025
3642909
[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (A…
JartX Sep 20, 2025
b7f186b
[BugFix] Exclude self when checking for port collision (#25286)
njhill Sep 20, 2025
6c5f82e
[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attent…
xuechendi Sep 20, 2025
f91480b
[Bugfix] fix tool call arguments is empty (#25223)
chaunceyjiang Sep 20, 2025
c60e613
[Optimization] Avoid repeated model architecture conversion for pooli…
DarkLight1337 Sep 20, 2025
9607d5e
[Hybrid Allocator] Support full attention with different hidden size …
heheda12345 Sep 20, 2025
be874c0
[Bugfix] Fix Qwen3-VL-MoE weight loading for EP (#25300)
ywang96 Sep 20, 2025
3d9a1d2
[V1] Support `LLM.apply_model` (#18465)
DarkLight1337 Sep 20, 2025
e08a3a3
[CI Failure] Disable FlashInfer RoPE to unblock CI (#25299)
mgoin Sep 20, 2025
032d661
[Docs] Fix warnings in mkdocs build (continued) (#25042)
wwl2755 Sep 20, 2025
bf8b26c
Generate _ModelInfo properties file when loading to improve loading …
manoelmarques Sep 20, 2025
3c713a9
[Model] Cleanup InternViT's data parallel implementation (#25306)
Isotr0py Sep 20, 2025
d88918e
[Core] Enable sharded state loader for V1 engine and enhance test cov…
lirong-lirong Sep 20, 2025
bef180f
[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307)
DarkLight1337 Sep 20, 2025
367a480
[Docs] Fix warnings in vllm/profiler and vllm/transformers_utils (#25…
windsonsea Sep 20, 2025
52c2a8d
[V0 Deprecation] Remove LLMEngine (#25033)
WoosukKwon Sep 21, 2025
86647d1
[V0 Deprecation] Remove V0 Output Processor (#25320)
WoosukKwon Sep 21, 2025
572ddf8
[Chore] Remove unused sampler in models (#25324)
WoosukKwon Sep 21, 2025
72dd159
[CI] Skip tests failing on main (#25326)
WoosukKwon Sep 21, 2025
c99db8c
[V0 Deprecation] Remove V0 core (#25321)
WoosukKwon Sep 21, 2025
62b38dc
[Doc] improve test-pipeline.yaml documentation (#25305)
hl475 Sep 21, 2025
1cd885b
[V0 Deprecation] Remove V0 model runner base & simplify worker base (…
WoosukKwon Sep 21, 2025
035fd2b
[Multi Modal][Performance] Fused Q,K's apply_rope in more models (#25…
wwl2755 Sep 21, 2025
12dbd83
[V0 Deprecation] Remove from_seq_group methods (#25330)
WoosukKwon Sep 21, 2025
7ed82d1
[V0 Deprecation] Remove V0 MP executor (#25329)
WoosukKwon Sep 21, 2025
cf56cf7
[V1] Add sliding window support to Flex Attention backend (#24089)
Isotr0py Sep 21, 2025
30d0891
[MM][Perf] Minor Optimization on Qwen3-VL `fast_pos_embed_interpolate…
ywang96 Sep 21, 2025
9aea737
[Bugfix] Typos in error message for missing model config file (#25339)
simondanielsson Sep 21, 2025
65a5910
[Optimization] Cache chat template result when processor fails to be …
DarkLight1337 Sep 21, 2025
26e673f
[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332)
WoosukKwon Sep 21, 2025
0ff8ebb
[V0 Deprecation] Remove async_output_proc, preemption mode, delay fac…
WoosukKwon Sep 21, 2025
c438b29
feat: Enable engine-level arguments with speculators models (#25250)
rahul-tuli Sep 21, 2025
1c3ffdb
[V0 Deprecation] Remove V0 sampling metadata (#25345)
WoosukKwon Sep 21, 2025
af7dfb0
[Perf] Further optimization for Qwen3-VL `fast_pos_embed_interpolate`…
Isotr0py Sep 21, 2025
bc6e542
Remove V0 attention backends (#25351)
WoosukKwon Sep 21, 2025
04d3752
[Bugfix][V0 Deprecation][CI] use async mock and await for async metho…
KKSK-DON Sep 21, 2025
5aeb925
Multimodal - audio tests (#25285)
debroy-rh Sep 21, 2025
7b57a43
[Model] Support Dots OCR (#24645)
ywang96 Sep 22, 2025
793be8d
[Docs] GSM8K Accuracy Evaluation doc update (#25360)
david6666666 Sep 22, 2025
0eecb31
[Bugfix] Fix hermes tool parser handling of non-string argument types…
david6666666 Sep 22, 2025
6d0b827
[V0 Deprecation] Remove V0-only methods in multi-modal registry (#25362)
DarkLight1337 Sep 22, 2025
f92d952
[V0 Deprecation] Remove `MultiModalPlaceholderMap` (#25366)
DarkLight1337 Sep 22, 2025
21467f9
Enable Eagle3 speculative decoding for GPT-OSS model (#25246)
eldarkurtic Sep 22, 2025
a66d131
[TPU][Bugfix][CI] Fix broken tests/build dependency (#25255)
NickLucche Sep 22, 2025
4cf71cc
[TPU] Deprecate `xm.mark_step` in favor of ``torch_xla.sync` (#25254)
NickLucche Sep 22, 2025
b6f01bd
refactor: abstract graph mode support into platform interface (#25161)
yiz-liu Sep 22, 2025
417a164
[Misc] Remove unused encoder-decoder error strings (#25374)
DarkLight1337 Sep 22, 2025
64c824c
Make pickle import check fast (#25379)
hmellor Sep 22, 2025
3d2c56b
Make `mypy` behave like a proper pre-commit hook (#25313)
hmellor Sep 22, 2025
ac24388
[Kernel] MI-300X triton moe configs (#23445)
Sara-KS Sep 22, 2025
c10101a
[Bugfix] Fix several issues with p2p xPyD in GET type (#23993)
Csrayz Sep 22, 2025
175811e
[V1][Attention] Split triton_attn in triton-only and rocm specific ba…
bringlein Sep 22, 2025
06a4133
[EPLB] Reduce EPLB Inference Overhead (#24573)
abmfy Sep 22, 2025
cfbee3d
[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in en…
Daisy-Ma-coder Sep 22, 2025
1d7f95b
[Compiler] Disable Inductor standalone compile by default (#25391)
ElizaWszola Sep 22, 2025
239ef0c
[CI Failure] Fix fp8 kv cache on <SM90 (#25396)
mgoin Sep 22, 2025
922979b
[DP] support torchrun external launcher with Data Parallelism (#24899)
luccafong Sep 22, 2025
8d0ee5a
[misc] Remove RFC review hours reference (#25416)
simon-mo Sep 22, 2025
d5e0fca
[torch.compile] Cleanup compilation tests and custom passes, add debu…
ProExpertProg Sep 22, 2025
8db2939
[KV offload][5/N] Add `CPUOffloadingSpec` (#24251)
orozery Sep 22, 2025
f552d5e
[CI/Build] Skip Qwen3-VL initialization tests until models are actual…
DarkLight1337 Sep 22, 2025
8bed179
[TPU] update torch_xla dependency for PyPI compatibility (#25278)
jcyang43 Sep 22, 2025
45d7d85
[Frontend] Responses API MCP tools for built in tools and to pass thr…
alecsolder Sep 22, 2025
d588cd2
[Bugfix] fix custom op test (#25429)
ProExpertProg Sep 23, 2025
f31ff87
[Core] Drop overly aggressive whisper assertion (#25408)
russellb Sep 23, 2025
0901970
[Bugfix] Fix missing `clear_connector_metadata` (#25397)
NickLucche Sep 23, 2025
ac0048c
[BugFix] [DP/EP] Fix slow execution when BS <= DP (#25407)
MatthewBonanni Sep 23, 2025
0b7bed9
[Performance] Remove input pads in cutlass_mla and optimize v_proj ou…
alexm-redhat Sep 23, 2025
9949aa2
[Perf] Apply torch.compile for `per_block_cast_to_fp8` (#24611)
yewentao256 Sep 23, 2025
6fa78d8
[V0 deprecation] Remove platform v1 controling interface (#25410)
Isotr0py Sep 23, 2025
c625f90
[V0 deprecation] Remove `_set_default_args_v0` function (#25409)
Isotr0py Sep 23, 2025
4741239
[Bug] Fix Long Context OOM Issue (#25290)
yewentao256 Sep 23, 2025
fc97733
[feat] Support MRoPE + YaRN (#25384)
JJJYmmm Sep 23, 2025
f225ea7
[XPU] Fix `compile_size` is `None` case. (#25433)
jikunshang Sep 23, 2025
eea1783
[benchmarks]allow skip ready check for bench serve (#25420)
luccafong Sep 23, 2025
78237e4
[Bugfix] Remove contiguous output req for context parallel MLA (#25414)
mgoin Sep 23, 2025
fafbe11
[Docs] Fix griffe warnings in vllm/lora/ops (#25369)
windsonsea Sep 23, 2025
e8db44f
[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP (#24588)
varun-sundar-rabindranath Sep 23, 2025
5774b0a
[NIXL][OOT platform] support nixl_connector with oot platform and oth…
xuechendi Sep 23, 2025
c98be0a
[Model] Enable DP for ViT in Qwen2-VL (#25445)
DarkLight1337 Sep 23, 2025
ba8d216
Handle triton kernel import exception (#25319)
minosfuture Sep 23, 2025
9383cd6
[Frontend] Add a new xml-based tool parser for qwen3-coder (#25028)
Zhikaiiii Sep 23, 2025
babad6e
[Misc] Move DP for ViT code inside model executor dir (#25459)
DarkLight1337 Sep 23, 2025
4322c55
[Test]: Hermes tool parser stream output error in Qwen3 case (#25203)
ahartel Sep 23, 2025
231c2c6
[Bugfix] Fix idefics3 `tie_word_embeddings` (#25454)
Isotr0py Sep 23, 2025
273690a
[Core] Optimize LoRA weight loading (#25403)
jeejeelee Sep 23, 2025
0d9fe26
[docs] Benchmark Serving Incorrect Arg (#25474)
vllmellm Sep 23, 2025
b6a136b
[CI/Build] Fix disabled v1 attention backend selection test (#25471)
Isotr0py Sep 23, 2025
61d1b35
[BugFix] Register expert_map as named buffer for wake_up and sleep (#…
wuxibin89 Sep 23, 2025
f05a4f0
[P/D] Support NIXL connector to disconnect during a clean shutdown (#…
chaunceyjiang Sep 23, 2025
da5e7e4
[Docs] NixlConnector quickstart guide (#24249)
panpan0000 Sep 23, 2025
4c966e4
[XPU] Fix MOE DP accuracy issue on XPU (#25465)
faaany Sep 23, 2025
2c58742
[UX] Change kv-cache-memory log level to debug (#25479)
mgoin Sep 23, 2025
a903669
[V1] Remove V0 code paths for Hybrid models (#25400)
tdoublep Sep 23, 2025
cc1dc7e
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support…
LucasWilkinson Sep 23, 2025
875d6de
Add backward compatibility for `GuidedDecodingParams` (#25422)
hmellor Sep 23, 2025
f11e3c5
[Kernels] Support blocked fp8 quantization for compressed tensors MoE…
bnellnm Sep 23, 2025
2357480
[BugFix] Fix UB in per_token_group_quant.cu (#24913)
rivos-shreeasish Sep 23, 2025
846197f
[Log] Optimize kv cache memory log from Bytes to GiB (#25204)
yewentao256 Sep 23, 2025
527821d
Use macro guard CUDA functions for back compatibility in grouped_topk…
minosfuture Sep 23, 2025
100b630
[V1][Kernel] Add triton implementation for `reshape_and_cache_flash` …
bringlein Sep 23, 2025
24e8222
[Misc] Reduce initialization time of auto_tune (#23682)
wdhongtw Sep 23, 2025
867ecdd
[Spec Decode][CI] Add e2e test for `examples/spec_decode.py` and prev…
ekagra-ranjan Sep 23, 2025
5abb117
[Core] Ensure LoRA linear respect the base_layer's tp_size and tp_ran…
jeejeelee Sep 23, 2025
a3a7828
[ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 (#24988)
amd-hhashemi Sep 23, 2025
8c1c81a
[core] add nccl symmetric memory for all reduce (#24532)
Amir-19 Sep 23, 2025
6340025
[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666)
ElizaWszola Sep 23, 2025
24fab45
[Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEW…
mgoin Sep 23, 2025
d5944d5
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue…
jiahanc Sep 23, 2025
a8ffc4f
[Bugfix] Lower gpt-oss max cudagraph size to 992 to be compatible wit…
mgoin Sep 23, 2025
8bdd8b5
Enable symmetric memory all reduce by default only enabling for TP (#…
ilmarkov Sep 23, 2025
8b8a8af
[CI] Fix Pre-commit Issue (#25497)
yewentao256 Sep 23, 2025
c828d1b
[Bugfix] gpt-oss container tool output bug (#25485)
alecsolder Sep 23, 2025
08275ec
[Build] Update Xgrammar to 0.1.25 (#25467)
chaunceyjiang Sep 23, 2025
690f948
[Bugfix] Fix for the import error from #24588 (#25481)
gshtras Sep 23, 2025
ae00292
[CI/Build] Fix and re-enable v1 PP test on CI (#25496)
Isotr0py Sep 23, 2025
4f8c4b8
[Core] Use KVCacheBlock as much as possible instead of dict[block_id,…
Jialin Sep 23, 2025
969b4da
[V0 Deprecation] Remove placeholder attn (#25510)
tdoublep Sep 23, 2025
eca7be9
Add VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE & VLLM_ENABLE_INDUCTOR_COORDINA…
rouchenzi Sep 23, 2025
4f2954f
Fix triton_reshape_and_cache_flash.py triton import (#25522)
mgoin Sep 23, 2025
95bc60e
[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI (#25428)
qandrew Sep 23, 2025
7361ab3
Remove redundant mutates_args and dispatch_key for direct_register_cu…
mgoin Sep 23, 2025
abad204
[BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory …
kouroshHakha Sep 23, 2025
c85d75c
Add `VLLM_NVTX_SCOPES_FOR_PROFILING=1` to enable `nvtx.annotate` scop…
coreylowman Sep 23, 2025
5e25b12
[Kernel] [Mamba] Remove BLOCK_H=1 from list of tuneable configuration…
tdoublep Sep 23, 2025
bde2a1a
[ROCm] Small functional changes for gptoss (#25201)
jpvillam-amd Sep 23, 2025
e0b24ea
[Perf] Increase default max splits for FA3 full cudagraphs (#25495)
LucasWilkinson Sep 23, 2025
1210e4d
[Bugfix] [B200] cutlass_mla - ensure kv_split == 1 for batch size > 1…
alexm-redhat Sep 23, 2025
dc464a3
[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for u…
LucasWilkinson Sep 24, 2025
7ad5e50
Improve output when failing json.loads() on structured output test (#…
dougbtv Sep 24, 2025
0d235b8
Add CUTLASS FP8 MOE benchmark scripts and kernel config (#25302)
chenxi-yang Sep 24, 2025
88d7bdb
[Bug] Fix AttributeError: 'FusedMoE' object has no attribute 'w13_wei…
yewentao256 Sep 24, 2025
c8bde93
[BUG] Allows for RunAI Streamer and Torch.compile cache to be used to…
ahao-anyscale Sep 24, 2025
be0bb56
[Model] Support SeedOss Reason Parser (#24263)
LuYanFCP Sep 24, 2025
d06b5a9
[V1][Metrics] Add per-request TPOT histogram (#24015)
baxingpiaochong Sep 24, 2025
1983609
[Bugfix] Use a separate FlashInfer workspace buffer for trtllm-gen (#…
benchislett Sep 24, 2025
de94289
[Core] Support weight_loader_v2 for `UnquantizedLinearMethod` (#23036)
kylesayrs Sep 24, 2025
bf68fd7
[Compile] Fix AMD Compile Error (#25518)
yewentao256 Sep 24, 2025
9df8da5
[BugFix] Fix MLA assert with CUTLASS MLA (#25478)
LucasWilkinson Sep 24, 2025
359d293
[fix]: add Arm 4bit fused moe support (#23809)
nikhil-arm Sep 24, 2025
77d9069
[KV sharing] Re-land Gemma3n model changes from #22628 (#24357)
sarckk Sep 24, 2025
c30b405
[Spec Decode] Enable FlashInfer Spec Decoding (#25196)
benchislett Sep 24, 2025
d747c2e
[Perf] Fix jit compiles at runtime of fla gated delta rule (#25432)
coreylowman Sep 24, 2025
5caaeb7
[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls (#2…
bbrowning Sep 24, 2025
190c45a
[TPU][Bugfix] fix the missing apply_model in tpu worker (#25526)
yaochengji Sep 24, 2025
fed8a9b
[Misc] Retry HF processing if "Already borrowed" error occurs (#25535)
DarkLight1337 Sep 24, 2025
1cbcfb9
[Bugfix][CPU] Skip unsupported custom op register on CPU (#25534)
bigPYJ1151 Sep 24, 2025
27ec3c7
[CI/Build] Fix v1 OOT registration test (#25547)
Isotr0py Sep 24, 2025
6488f34
[Misc]] Move processing context to multimodal directory (#25548)
DarkLight1337 Sep 24, 2025
77a7fce
[CI/Build] add nightly prime-rl integration tests (#25207)
Jackmin801 Sep 24, 2025
2e19a84
[V0 Deprecation] Remove max_seq_len_to_capture (#25543)
WoosukKwon Sep 24, 2025
2338daf
[BugFix] Potential Fix for FA3 full-cudagraph IMA (#25490)
LucasWilkinson Sep 24, 2025
b67dece
[misc] update the warning message (#25566)
youkaichao Sep 24, 2025
42488da
[Bugfix] Fix dummy video number of frames calculation (#25553)
ywang96 Sep 24, 2025
58c360d
[Bug] fix import and unit test (#25558)
jmkuebler Sep 24, 2025
1642995
[Benchmark] Fix regression in structured output benchmark (#25500)
russellb Sep 24, 2025
b106890
[docs] fix nixl kv_connector_extra_config.backends key (#25565)
panpan0000 Sep 24, 2025
e18b714
[Bugfix] Fix DeepSeekV31ToolParser to correctly parse multiple tools …
taohui Sep 24, 2025
8938774
Move `DeviceConfig`, `ObservabilityConfig`, `SpeechToTextConfig` to t…
hmellor Sep 24, 2025
9313be5
[Misc] Improve type annotations for jsontree (#25577)
DarkLight1337 Sep 24, 2025
487745f
[ROCm][Bugfix] Only enable +rms_norm based on aiter if not explicitly…
gshtras Sep 24, 2025
302eb94
[ROCm][Build][Bugfix] Fix ROCm base docker whls installation order (#…
gshtras Sep 24, 2025
d83f3f7
Fixes and updates to bench_per_token_quant_fp8 (#25591)
mgoin Sep 24, 2025
2dda3e3
[Bugfix] add cache model when from object storage get model (#24764)
lengrongfu Sep 24, 2025
54e42b7
Support mnnvl all2allv from Flashinfer (#21003)
wenscarl Sep 24, 2025
f84a472
Suppress benign cuBLAS warning when capturing cudagraphs with DBO (#2…
SageMoore Sep 24, 2025
8c85305
[Docs] Enable `fail_on_warning` for the docs build in CI (#25580)
hmellor Sep 24, 2025
e6750d0
[V0 Deprecation] Remove unused classes in attention (#25541)
WoosukKwon Sep 24, 2025
fea8006
[Logging] Improve log for when DeepEP HT disables CUDA Graphs (#25531)
tlrmchlsmth Sep 24, 2025
6160ba4
feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expe…
djmmoss Sep 24, 2025
1f29141
[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517)
yewentao256 Sep 24, 2025
e7f27ea
Improve `--help` for enhanced user experience (#24903)
hmellor Sep 24, 2025
5c1e496
[MISC] replace c10::optional with std::optional (#25602)
842974287 Sep 24, 2025
52d0cb8
[Model] Improve DotsOCRForCausalLM (#25466)
jeejeelee Sep 24, 2025
05c1948
[Kernel] Support DCP for Triton backend (#25132)
frank-wei Sep 25, 2025
4492e3a
[Bug] Dynamo Unsupported due to `BasevLLMParameter.torch_function` ca…
yewentao256 Sep 25, 2025
90b139c
Enable Fbgemm NVFP4 on Dense models (#25609)
samanamp Sep 25, 2025
845adb3
[Model] Add LongCat-Flash (#23991)
OftenDream Sep 25, 2025
c85be1f
optimize: eliminate duplicate split_enc_dec_inputs calls (#25573)
nicole-lihui Sep 25, 2025
a676e66
[Bugfix] fix apply_temperature to avoid nan in probs (#24734)
courage17340 Sep 25, 2025
755ed7b
[Misc] Simplify PoolerOutput and move to `v1/outputs` (#25629)
DarkLight1337 Sep 25, 2025
bc092ea
Map CwmForCausalLM to llama and LlamaForCausalLM (#25611)
jacobkahn Sep 25, 2025
af4ee63
typo: remove duplicate `is` (#25641)
nicole-lihui Sep 25, 2025
1260180
Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class…
tlrmchlsmth Sep 25, 2025
393de22
[fix] Update torch version in cpu-build.txt for AArch64/ppc64le and D…
fadara01 Sep 25, 2025
7be9ffc
[Misc] Fix Qwen3-VL `video_grid_thw` typing (#25646)
ywang96 Sep 25, 2025
3c2b2cc
[Bugfix] Add triton.language.tensor placeholder (#25649)
adobrzyn Sep 25, 2025
17b4c66
[Bugfix] Fix Qwen3-VL max_num_video_tokens calculation for video prof…
Isotr0py Sep 25, 2025
12c1287
[mypy] Further improve MM type annotations (#25654)
DarkLight1337 Sep 25, 2025
eaeca3c
[Bugfix] Parse SpeculativeConfig Error (#25142)
yyzxw Sep 25, 2025
7f570f1
[V0 deprecation] Remove unreachable model_config.supported_tasks (#25…
noooop Sep 25, 2025
70fbdb2
Add backward compatibility for `guided_...` API (#25615)
hmellor Sep 25, 2025
0bcc3a1
[CI/Build] Fix flaky entrypoints test (#25663)
DarkLight1337 Sep 25, 2025
d2af674
[XPU][Triton]add xpu config in triton_reshape_and_cache_flash (#25643)
jikunshang Sep 25, 2025
1e9a77e
[Hardware][RISC-V] Add riscv64 support for vLLM with scalar (#22112)
langc23 Sep 25, 2025
2f17117
[mypy] Fix wrong type annotations related to tuple (#25660)
DarkLight1337 Sep 25, 2025
6c340da
[misc] log info messages by default for hanging / busy / idle (#25627)
youkaichao Sep 25, 2025
69a8c8e
[torch.compile] Make Query Quantization Fusable (#24914)
jmkuebler Sep 25, 2025
eb32335
[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata (#…
bigPYJ1151 Sep 25, 2025
532a6cf
[ux] Switch a warning to debug about a pytorch fallback (#23750)
russellb Sep 25, 2025
03858e6
[Bugfix] Fix InternS1 video processing after Transformers v4.56 (#25644)
Isotr0py Sep 25, 2025
0754ac4
[Misc] Remove cruft file in repo (#25678)
NickLucche Sep 25, 2025
2e5df88
[Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning …
tlrmchlsmth Sep 25, 2025
e04a1b6
[BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin…
AlonKejzman Sep 25, 2025
916bd92
Revert "[Bug] Dynamo Unsupported due to `BasevLLMParameter.torch_func…
mgoin Sep 25, 2025
13cc7f5
[BugFix] Fix DBO hang (#25625)
LucasWilkinson Sep 25, 2025
b8d9e4a
[Model] Add optional parameter to reasoning parser constructor (#25554)
taohui Sep 25, 2025
0ea80c8
[Model] Define `merge_by_field_config` MM interface (#25676)
DarkLight1337 Sep 25, 2025
71b25b0
[V0 deprecation] Clean up V0 fallback in compilation config (#25675)
Isotr0py Sep 25, 2025
3468f17
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend name…
MatthewBonanni Sep 25, 2025
0fa673a
[V0 deprecation] Clean up LoRA (#25686)
jeejeelee Sep 25, 2025
6b0fcbb
[Misc] Simplify `test_argsort_mm_positions` (#25690)
DarkLight1337 Sep 25, 2025
3d54bdc
[Optimization] Streamline `InputPreprocessor` (#25702)
DarkLight1337 Sep 25, 2025
89fa54e
[Optimization] Use a cheaper cache key in `get_model_architecture` (#…
DarkLight1337 Sep 25, 2025
e71b8e2
[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986)
ekagra-ranjan Sep 25, 2025
8c435c9
[Core] Enable command line logging for LLMEngine (#25610)
zhuohan123 Sep 25, 2025
57329a8
[Model] rename NemotronH_Nano_VL -> NemotronH_Nano_VL_V2 (#25708)
tomeras91 Sep 25, 2025
081b559
Fix routing_bias dtype (#25711)
wenscarl Sep 25, 2025
9fe4c2b
[Refactor] Remove DeepGEMM OP Register (#25710)
yewentao256 Sep 26, 2025
8b77328
[Misc] Don't log shm dequeue delay warning on worker side (#25720)
njhill Sep 26, 2025
53a3084
Llamas 3.1 405B fp4 changes upstreaming from 355_wip (#25135)
maleksan85 Sep 26, 2025
13dd93c
[Core] Force PIECEWISE CUDAGraph mode for encoder-decoder (#25701)
russellb Sep 26, 2025
983056e
[Misc] Remove unnecessary memoryviews in shm_broadcast.py (#25721)
njhill Sep 26, 2025
392edee
EVS Support (Video tokens pruning) (#22980)
BloodAxe Sep 26, 2025
3edf87d
[CI/Build] fix doc build warning: Failed to get 'name: description' p…
yitingdc Sep 26, 2025
e84e073
fix: revert cast to cpu in `MsgpackEncoder._encode_tensor` to avoid h…
qthequartermasterman Sep 26, 2025
d48f4d6
perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds…
qthequartermasterman Sep 26, 2025
52621c8
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300…
xaguilar-amd Sep 26, 2025
6e30010
fix: print outputt offline_inference/base/chat.py example (#25744)
Iceber Sep 26, 2025
99b3a50
[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and …
sighingnow Sep 26, 2025
dd70437
Remove cuda hard-code in compute_causal_conv1d_metadata (#25555)
wxsIcey Sep 26, 2025
19f76ee
[misc] refactor speculative config (#25657)
yyzxw Sep 26, 2025
dfb9af2
[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk…
SageMoore Sep 26, 2025
b03b1b9
Support LongCat-Flash-Chat tool call (#24083)
Xu-Wenqing Sep 26, 2025
633f943
[Doc] Update Batch-level DP docs (#25757)
DarkLight1337 Sep 26, 2025
2b6b1d7
[Model] Mamba2 varlen refactor (#21467)
cyang49 Sep 26, 2025
2827b3f
[CI] Fix test_shared_storage_connector_hashes (#25748)
chaunceyjiang Sep 26, 2025
fe6b19c
[Bugfix] Properly abort pooling request. (#25734)
noooop Sep 26, 2025
bc9d7b5
[CI/Build] Split up Distributed Tests (#25572)
DarkLight1337 Sep 26, 2025
db1e42f
[CI/Build] Fix some V1 tests not being run (#25569)
DarkLight1337 Sep 26, 2025
d4d9899
[Quantization] Add field to skip unquantized modules for GPTQ config …
Isotr0py Sep 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 4 additions & 4 deletions .buildkite/check-wheel-size.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@
import sys
import zipfile

# Read the VLLM_MAX_SIZE_MB environment variable, defaulting to 400 MiB
# Note that we have 400 MiB quota, please use it wisely.
# See https://github.com/pypi/support/issues/3792 .
# Read the VLLM_MAX_SIZE_MB environment variable, defaulting to 450 MiB
# Note that we have 800 MiB quota, please use it wisely.
# See https://github.com/pypi/support/issues/6326 .
# Please also sync the value with the one in Dockerfile.
VLLM_MAX_SIZE_MB = int(os.environ.get("VLLM_MAX_SIZE_MB", 400))
VLLM_MAX_SIZE_MB = int(os.environ.get("VLLM_MAX_SIZE_MB", 450))


def print_top_10_largest_files(zip_file):
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/nightly-benchmarks/nightly-descriptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This benchmark aims to:

Latest results: [results link](https://blog.vllm.ai/2024/09/05/perf-update.html), scroll to the end.

Latest reproduction guilde: [github issue link](https://github.com/vllm-project/vllm/issues/8176)
Latest reproduction guide: [github issue link](https://github.com/vllm-project/vllm/issues/8176)

## Setup

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ def split_json_by_tp_pp(
"--xaxis",
type=str,
default="# of max concurrency.",
help="column name to use as X Axis in comparision graph",
help="column name to use as X Axis in comparison graph",
)
args = parser.parse_args()

Expand Down
Loading
Loading