|
| 1 | +--- |
| 2 | +id: use-woodpecker.md |
| 3 | +title: Use Woodpecker (Milvus v2.6.x) |
| 4 | +related_key: Woodpecker |
| 5 | +summary: Learn how to enable woodpecker as the WAL in milvus. |
| 6 | +--- |
| 7 | +## Use Woodpecker (Milvus v2.6.x) |
| 8 | + |
| 9 | +This guide explains how to enable and use Woodpecker as the Write-Ahead Log (WAL) in Milvus 2.6.x. Woodpecker is a cloud‑native WAL designed for object storage, offering high throughput, low operational overhead, and seamless scalability. For architecture and benchmark details, see [Woodpecker](woodpecker_architecture.md). |
| 10 | + |
| 11 | +### Overview |
| 12 | + |
| 13 | +- Starting from Milvus 2.6, Woodpecker is an optional WAL that provides ordered writes and recovery as the logging service. |
| 14 | +- As a message queue choice, it behaves similarly to Pulsar/Kafka and can be enabled via configuration. |
| 15 | +- Two storage backends are supported: local file system (`local`) and object storage (`minio`/S3-compatible). |
| 16 | + |
| 17 | +### Quick start |
| 18 | + |
| 19 | +To enable Woodpecker, set the MQ type to Woodpecker: |
| 20 | + |
| 21 | +```yaml |
| 22 | +mq: |
| 23 | + type: woodpecker |
| 24 | +``` |
| 25 | +
|
| 26 | +Note: Switching `mq.type` for a running cluster is an upgrade operation. Follow the upgrade procedure carefully and validate on a fresh cluster before switching production. |
| 27 | + |
| 28 | +### Configuration |
| 29 | + |
| 30 | +Below is the complete Woodpecker configuration block (edit `milvus.yaml` or override in `user.yaml`): |
| 31 | + |
| 32 | +```yaml |
| 33 | +# Related configuration of woodpecker, used to manage Milvus logs of recent mutation operations, output streaming log, and provide embedded log sequential read and write. |
| 34 | +woodpecker: |
| 35 | + meta: |
| 36 | + type: etcd # The Type of the metadata provider. currently only support etcd. |
| 37 | + prefix: woodpecker # The Prefix of the metadata provider. default is woodpecker. |
| 38 | + client: |
| 39 | + segmentAppend: |
| 40 | + queueSize: 10000 # The size of the queue for pending messages to be sent of each log. |
| 41 | + maxRetries: 3 # Maximum number of retries for segment append operations. |
| 42 | + segmentRollingPolicy: |
| 43 | + maxSize: 256M # Maximum size of a segment. |
| 44 | + maxInterval: 10m # Maximum interval between two segments, default is 10 minutes. |
| 45 | + maxBlocks: 1000 # Maximum number of blocks in a segment |
| 46 | + auditor: |
| 47 | + maxInterval: 10s # Maximum interval between two auditing operations, default is 10 seconds. |
| 48 | + logstore: |
| 49 | + segmentSyncPolicy: |
| 50 | + maxInterval: 200ms # Maximum interval between two sync operations, default is 200 milliseconds. |
| 51 | + maxIntervalForLocalStorage: 10ms # Maximum interval between two sync operations local storage backend, default is 10 milliseconds. |
| 52 | + maxBytes: 256M # Maximum size of write buffer in bytes. |
| 53 | + maxEntries: 10000 # Maximum entries number of write buffer. |
| 54 | + maxFlushRetries: 5 # Maximum size of write buffer in bytes. |
| 55 | + retryInterval: 1000ms # Maximum interval between two retries. default is 1000 milliseconds. |
| 56 | + maxFlushSize: 2M # Maximum size of a fragment in bytes to flush. |
| 57 | + maxFlushThreads: 32 # Maximum number of threads to flush data |
| 58 | + segmentCompactionPolicy: |
| 59 | + maxSize: 2M # The maximum size of the merged files. |
| 60 | + maxParallelUploads: 4 # The maximum number of parallel upload threads for compaction. |
| 61 | + maxParallelReads: 8 # The maximum number of parallel read threads for compaction. |
| 62 | + segmentReadPolicy: |
| 63 | + maxBatchSize: 16M # Maximum size of a batch in bytes. |
| 64 | + maxFetchThreads: 32 # Maximum number of threads to fetch data. |
| 65 | + storage: |
| 66 | + type: minio # The Type of the storage provider. Valid values: [minio, local] |
| 67 | + rootPath: /var/lib/milvus/woodpecker # The root path of the storage provider. |
| 68 | +``` |
| 69 | + |
| 70 | +Key notes: |
| 71 | + |
| 72 | +- `woodpecker.meta` |
| 73 | + - **type**: Currently only `etcd` is supported. Reuse the same etcd as Milvus to store lightweight metadata. |
| 74 | + - **prefix**: The key prefix for metadata. Default: `woodpecker`. |
| 75 | +- `woodpecker.client` |
| 76 | + - Controls segment append/rolling/auditing behavior on the client side to balance throughput and end‑to‑end latency. |
| 77 | +- `woodpecker.logstore` |
| 78 | + - Controls sync/flush/compaction/read policies for log segments. These are the primary knobs for throughput/latency tuning. |
| 79 | +- `woodpecker.storage` |
| 80 | + - **type**: `minio` for MinIO/S3‑compatible object storage (MinIO/S3/GCS/OSS, etc.); `local` for local/shared file systems. |
| 81 | + - **rootPath**: Root path for the storage backend (effective for `local`; with `minio`, paths are dictated by bucket/prefix). |
| 82 | + |
| 83 | +### Deployment modes |
| 84 | + |
| 85 | +Milvus supports both Standalone and Cluster modes. Woodpecker storage backend support matrix: |
| 86 | + |
| 87 | +| | `storage.type=local` | `storage.type=minio` | |
| 88 | +| ----------------- | ------------------------ | ------------------------ | |
| 89 | +| Milvus Standalone | Supported | Supported | |
| 90 | +| Milvus Cluster | Limited (needs shared FS) | Supported | |
| 91 | + |
| 92 | +Notes: |
| 93 | + |
| 94 | +- With `minio`, Woodpecker shares the same object storage with Milvus (MinIO/S3/GCS/OSS, etc.). |
| 95 | +- With `local`, a single‑node local disk is only suitable for Standalone. If all pods can access a shared file system (e.g., NFS), Cluster mode can also use `local`. |
| 96 | + |
| 97 | +## Deployment guides |
| 98 | + |
| 99 | +### Enable Woodpecker for a Milvus Cluster on Kubernetes (Milvus Operator, storage=minio) |
| 100 | + |
| 101 | +After installing the [Milvus Operator](install_cluster-milvusoperator.md), start a Milvus cluster with Woodpecker enabled using the official sample: |
| 102 | + |
| 103 | +```bash |
| 104 | +kubectl apply -f https://raw.githubusercontent.com/zilliztech/milvus-operator/main/config/samples/milvus_cluster_woodpecker.yaml |
| 105 | +
|
| 106 | +``` |
| 107 | + |
| 108 | +This sample configures Woodpecker as the message queue and enables the Streaming Node. The first startup may take time to pull images; wait until all pods are ready: |
| 109 | + |
| 110 | +```bash |
| 111 | +kubectl get pods |
| 112 | +kubectl get milvus my-release -o yaml | grep -A2 status |
| 113 | +``` |
| 114 | +When ready, you should see pods similar to: |
| 115 | +``` |
| 116 | +NAME READY STATUS RESTARTS AGE |
| 117 | +my-release-etcd-0 1/1 Running 0 17m |
| 118 | +my-release-etcd-1 1/1 Running 0 17m |
| 119 | +my-release-etcd-2 1/1 Running 0 17m |
| 120 | +my-release-milvus-datanode-7f8f88499d-kc66r 1/1 Running 0 16m |
| 121 | +my-release-milvus-mixcoord-7cd7998d-x59kg 1/1 Running 0 16m |
| 122 | +my-release-milvus-proxy-5b56cf8446-pbnjm 1/1 Running 0 16m |
| 123 | +my-release-milvus-querynode-0-558d9cdd57-sgbfx 1/1 Running 0 16m |
| 124 | +my-release-milvus-streamingnode-58fbfdfdd8-vtxfd 1/1 Running 0 16m |
| 125 | +my-release-minio-0 1/1 Running 0 17m |
| 126 | +my-release-minio-1 1/1 Running 0 17m |
| 127 | +my-release-minio-2 1/1 Running 0 17m |
| 128 | +my-release-minio-3 1/1 Running 0 17m |
| 129 | +``` |
| 130 | +Run the following command to uninstall the Milvus cluster. |
| 131 | +```bash |
| 132 | +kubectl delete milvus my-release |
| 133 | +``` |
| 134 | + |
| 135 | +If you need to adjust Woodpecker parameters, follow the settings described in [message storage config](deploy_pulsar.md). |
| 136 | + |
| 137 | +### Enable Woodpecker for a Milvus Cluster on Kubernetes (Helm Chart, storage=minio) |
| 138 | + |
| 139 | +First add and update the Milvus Helm chart as described in [Run Milvus in Kubernetes with Helm](install_cluster-helm.md). |
| 140 | + |
| 141 | +Then deploy with one of the following examples: |
| 142 | + |
| 143 | +– Cluster deployment (recommended settings with Woodpecker and Streaming Node enabled): |
| 144 | + |
| 145 | +```bash |
| 146 | +helm install my-release zilliztech/milvus \ |
| 147 | + --set image.all.tag=v2.6.0 \ |
| 148 | + --set pulsarv3.enabled=false \ |
| 149 | + --set woodpecker.enabled=true \ |
| 150 | + --set streaming.enabled=true \ |
| 151 | + --set indexNode.enabled=false |
| 152 | +``` |
| 153 | + |
| 154 | +– Standalone deployment (Woodpecker enabled): |
| 155 | + |
| 156 | +```bash |
| 157 | +helm install my-release zilliztech/milvus \ |
| 158 | + --set image.all.tag=v2.6.0 \ |
| 159 | + --set cluster.enabled=false \ |
| 160 | + --set pulsarv3.enabled=false \ |
| 161 | + --set standalone.messageQueue=woodpecker \ |
| 162 | + --set woodpecker.enabled=true \ |
| 163 | + --set streaming.enabled=true |
| 164 | +``` |
| 165 | + |
| 166 | +After deployment, follow the docs to port‑forward and connect. To adjust Woodpecker parameters, follow the settings described in [message storage config](deploy_pulsar.md). |
| 167 | + |
| 168 | +### Enable Woodpecker for Milvus Standalone in Docker (storage=local) |
| 169 | + |
| 170 | +Follow [Run Milvus in Docker](install_standalone-docker.md). Example: |
| 171 | + |
| 172 | +```bash |
| 173 | +mkdir milvus-wp && cd milvus-wp |
| 174 | +curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh |
| 175 | +
|
| 176 | +# Create user.yaml to enable Woodpecker with local filesystem |
| 177 | +cat > user.yaml <<'EOF' |
| 178 | +mq: |
| 179 | + type: woodpecker |
| 180 | +woodpecker: |
| 181 | + storage: |
| 182 | + type: local |
| 183 | + rootPath: /var/lib/milvus/woodpecker |
| 184 | +EOF |
| 185 | +
|
| 186 | +bash standalone_embed.sh start |
| 187 | +``` |
| 188 | + |
| 189 | +To further change Woodpecker settings, update `user.yaml` and run `bash standalone_embed.sh restart`. |
| 190 | + |
| 191 | +### Enable Woodpecker for Milvus Standalone with Docker Compose (storage=minio) |
| 192 | + |
| 193 | +Follow [Run Milvus with Docker Compose](install_standalone-docker-compose.md). Example: |
| 194 | + |
| 195 | +```bash |
| 196 | +mkdir milvus-wp-compose && cd milvus-wp-compose |
| 197 | +wget https://github.com/milvus-io/milvus/releases/download/v2.6.0/milvus-standalone-docker-compose.yml -O docker-compose.yml |
| 198 | +# By default, the Docker Compose standalone uses Woodpecker |
| 199 | +sudo docker compose up -d |
| 200 | +# If you need to change Woodpecker parameters further, write an override: |
| 201 | +docker exec -it milvus-standalone bash -lc 'cat > /milvus/configs/user.yaml <<EOF |
| 202 | +mq: |
| 203 | + type: woodpecker |
| 204 | +woodpecker: |
| 205 | + logstore: |
| 206 | + segmentSyncPolicy: |
| 207 | + maxFlushThreads: 16 |
| 208 | + storage: |
| 209 | + type: minio |
| 210 | +EOF' |
| 211 | +
|
| 212 | +# Restart the container to apply the changes |
| 213 | +docker restart milvus-standalone |
| 214 | +``` |
| 215 | + |
| 216 | +## Throughput tuning tips |
| 217 | + |
| 218 | +Based on the benchmarks and backend limits in [Woodpecker](woodpecker_architecture.md), optimize end‑to‑end write throughput from the following aspects: |
| 219 | + |
| 220 | +- Storage‑side |
| 221 | + - **Object storage (minio/S3‑compatible)**: Increase concurrency and object size (avoid tiny objects). Watch network and bucket bandwidth limits. A single MinIO node on SSD often caps around 100 MB/s locally; a single EC2 to S3 can reach GB/s. |
| 222 | + - **Local/shared file systems (local)**: Prefer NVMe/fast disks. Ensure the FS handles small writes and fsync latency well. |
| 223 | +- Woodpecker knobs |
| 224 | + - Increase `logstore.segmentSyncPolicy.maxFlushSize` and `maxFlushThreads` for larger flushes and higher parallelism. |
| 225 | + - Tune `maxInterval` according to media characteristics (trade latency for throughput with longer aggregation). |
| 226 | + - For object storage, consider increasing `segmentRollingPolicy.maxSize` to reduce segment switches. |
| 227 | +- Client/application side |
| 228 | + - Use larger batch sizes and more concurrent writers/clients. |
| 229 | + - Control refresh/index build timing (batch up before triggering) to avoid frequent small writes. |
| 230 | + |
| 231 | +Batch Insert Demo |
| 232 | +```python |
| 233 | +from pymilvus import MilvusClient |
| 234 | +import random |
| 235 | +
|
| 236 | +# 1. Set up a Milvus client |
| 237 | +client = MilvusClient( |
| 238 | + uri="http://<Proxy Pod IP>:27017", |
| 239 | +) |
| 240 | +
|
| 241 | +# 2. Create a collection |
| 242 | +res = client.create_collection( |
| 243 | + collection_name="test_milvus_wp", |
| 244 | + dimension=512, |
| 245 | + metric_type="IP", |
| 246 | + shards_num=2, |
| 247 | +) |
| 248 | +print(res) |
| 249 | +
|
| 250 | +# 3. Insert randomly generated vectors |
| 251 | +colors = ["green", "blue", "yellow", "red", "black", "white", "purple", "pink", "orange", "brown", "grey"] |
| 252 | +data = [] |
| 253 | +
|
| 254 | +batch_size = 1000 |
| 255 | +batch_count = 2000 |
| 256 | +for j in range(batch_count): |
| 257 | + start_time = time.time() |
| 258 | + print(f"Inserting {j}th vectors {j * batch_size} startTime{start_time}") |
| 259 | + for i in range(batch_size): |
| 260 | + current_color = random.choice(colors) |
| 261 | + data.append({ |
| 262 | + "id": (j*batch_size + i), |
| 263 | + "vector": [ random.uniform(-1, 1) for _ in range(512) ], |
| 264 | + "color": current_color, |
| 265 | + "color_tag": f"{current_color}_{str(random.randint(1000, 9999))}" |
| 266 | + }) |
| 267 | + res = client.insert( |
| 268 | + collection_name="test_milvus_wp", |
| 269 | + data=data |
| 270 | + ) |
| 271 | + data = [] |
| 272 | + print(f"Inserted {j}th vectors endTime:{time.time()} costTime:{time.time() - start_time}") |
| 273 | +``` |
| 274 | + |
| 275 | +## Latency |
| 276 | + |
| 277 | +Woodpecker is a cloud-native WAL designed for object storage with trade-offs between throughput, cost, and latency. The currently supported lightweight embedded mode prioritizes cost and throughput optimization, as most scenarios only require data to be written within a certain time rather than demanding low latency for individual write requests. Therefore, Woodpecker employs batched writes, with default intervals of 10ms for local filesystem storage backends and 200ms for MinIO-like storage backends. During slow write operations, the maximum latency equals the interval time plus flush time. |
| 278 | + |
| 279 | +Note that batch insertion is triggered not only by time intervals but also by batch size, which defaults to 2MB. |
| 280 | + |
| 281 | +For details on architecture, deployment modes (MemoryBuffer / QuorumBuffer), and performance, see [Woodpecker Architecture](woodpecker_architecture.md). |
| 282 | + |
| 283 | +For more parameter details, refer to the Woodpecker [GitHub repository](https://github.com/zilliztech/woodpecker). |
0 commit comments