You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,7 +58,9 @@ Deploy Blazing-fast LLMs powered by [vLLM](https://github.com/vllm-project/vllm)
58
58
59
59
### Option 1: Deploy Any Model Using Pre-Built Docker Image [Recommended]
60
60
> [!TIP]
61
-
> This is the recommended way to deploy your model, as it does not require you to build a Docker image, upload heavy models to DockerHub and wait for workers to download them. Instead, use this option to deploy your model in a few clicks. For even more convenience, attach a network storage volume to your Endpoint, which will download the model once and share it across all workers.
61
+
> This is the quickest and easiest way to tes your model, as it does not require you to build a Docker image, upload heavy models to DockerHub and wait for workers to download them. You can use this option to deploy your model in a few clicks. For even more convenience, attach a network storage volume to your Endpoint, which will download the model once and share it across all workers.
62
+
>
63
+
> However, for actual deployment, it is recommended that you build an image with the model baked in, which is described in Option 2 - this will ensure the fastest load speeds.
62
64
63
65
We now offer a pre-built Docker Image for the vLLM Worker that you can configure entirely with Environment Variables when creating the RunPod Serverless Endpoint:
64
66
@@ -70,8 +72,8 @@ Below is a summary of the available RunPod Worker images, categorized by image s
70
72
71
73
| CUDA Version | Stable Image Tag | Development Image Tag | Note |
| 11.8.0 |`runpod/worker-vllm:0.3.1-cuda11.8.0`|`runpod/worker-vllm:dev-cuda11.8.0`| Available on all RunPod Workers without additional selection needed. |
74
-
| 12.1.0 |`runpod/worker-vllm:0.3.1-cuda12.1.0`|`runpod/worker-vllm:dev-cuda12.1.0`| When creating an Endpoint, select CUDA Version 12.2 and 12.1 in the filter. |
75
+
| 11.8.0 |`runpod/worker-vllm:0.3.2-cuda11.8.0`|`runpod/worker-vllm:dev-cuda11.8.0`| Available on all RunPod Workers without additional selection needed. |
76
+
| 12.1.0 |`runpod/worker-vllm:0.3.2-cuda12.1.0`|`runpod/worker-vllm:dev-cuda12.1.0`| When creating an Endpoint, select CUDA Version 12.2 and 12.1 in the filter. |
75
77
76
78
This table provides a quick reference to the image tags you should use based on the desired CUDA version and image stability (Stable or Development). Ensure to follow the selection note for CUDA 12.1.0 compatibility.
0 commit comments