Skip to content

Commit f33353e

Browse files
committed
feat: add config details to batch inference demo
1 parent a86ac3f commit f33353e

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

demo-notebooks/additional-demos/batch-inference/remote_offline_bi.ipynb

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,16 @@
4949
"- Configure some settings for GPU processing\n",
5050
"- Defines batch processing parameters (8 requests per batch, 2 GPU workers)\n",
5151
"\n",
52+
"#### Model Source Configuration\n",
53+
"\n",
54+
"The `model_source` parameter supports several loading methods:\n",
55+
"\n",
56+
"* **Hugging Face Hub** (default): Use repository ID `model_source=\"meta-llama/Llama-2-7b-chat-hf\"`\n",
57+
"* **Local Directory**: Use file path `model_source=\"/path/to/my/local/model\"`\n",
58+
"* **Other Sources**: ModelScope via environment variables `VLLM_MODELSCOPE_DOWNLOADS_DIR`\n",
59+
"\n",
60+
"For complete model support and options, see the [official vLLM documentation](https://docs.vllm.ai/en/latest/models/supported_models.html).\n",
61+
"\n",
5262
"```python\n",
5363
"import ray\n",
5464
"from ray.data.llm import build_llm_processor, vLLMEngineProcessorConfig\n",
@@ -60,7 +70,15 @@
6070
" dtype=\"half\",\n",
6171
" max_model_len=1024,\n",
6272
" ),\n",
73+
" # Batch size: Larger batches increase throughput but reduce fault tolerance\n",
74+
" # - Small batches (4-8): Better for fault tolerance and memory constraints\n",
75+
" # - Large batches (16-32): Higher throughput, better GPU utilization\n",
76+
" # - Choose based on your Ray Cluster size and memory availability\n",
6377
" batch_size=8,\n",
78+
" # Concurrency: Number of vLLM engine workers to spawn \n",
79+
" # - Set to match your total GPU count for maximum utilization\n",
80+
" # - Each worker gets assigned to a GPU automatically by Ray scheduler\n",
81+
" # - Can use all GPUs across head and worker nodes\n",
6482
" concurrency=2,\n",
6583
")\n",
6684
"```"
@@ -105,6 +123,7 @@
105123
"cell_type": "markdown",
106124
"metadata": {},
107125
"source": [
126+
"#### Running the Pipeline\n",
108127
"Now we can run the batch inference pipeline on our data, it will:\n",
109128
"- In the background, the processor will download the model into memory where vLLM serves it locally (on Ray Cluster) for use in inference\n",
110129
"- Generate a sample Ray Dataset with 32 rows (0-31) to process\n",

0 commit comments

Comments
 (0)