You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to adjust the rope_scaling environment variable. It's mentioned in the documentation as possible, but the assignment in engine_args.py parses the environment variable as a string (obviously). This behavior will also be the same for any other engine argument using dict.
As by default a handler is used to proxy the requests to the serverless vLLM instances, a simple command override with vllm serve is not sufficient in this case.
# Example for Qwen/Qwen3-30B-A3B with extended context window
vllm server --rope-scaling '{"rope_type": "yarn", "factor": 4.0, "original_max_position_embeddings": 32768}'