Releases: kaito-project/kaito
Releases · kaito-project/kaito
v0.6.2
v0.6.1
v0.6.0
v0.6.0 - 2025-08-08
This release includes these major changes:
- Added support for DeepSeek-R1/DeepSeek-V3 models.
- Added /v1/chat/completions API for RAGEngine.
- Provided better UX for preferred nodes and cpu nodes.
- Updated documentation with new features, integrations.
- Added NVIDIA A10 GPU to the supported SKUs.
Changelog
Features 🌈
- 253d6aa feat: add /v1/chat/completions API for RAGEngine (#1277)
- dc46418 feat: add deepseek-r1/deepseek-v3 model (#1251)
- bfd271a feat: adding RAGEngine CRD shortName and ServiceReady status column (#1336)
- 45cbbf1 feat: support Preferred node in RAG (#1327)
- ee0c3d0 feat: add
make help
target to Makefile (#1248)
Bug Fixes 🐞
- e39f82f fix: pin phi2 to vllm v0 (#1369)
- 8f4fa75 fix: fix bug where fetch GPU count was failing and defaulting (#1338)
- 06f4cbd fix: resolve pydantic deprecation warnings (#1317)
- 78ef22d fix: image link error in scaler proposals (#1318)
- 2163923 fix: get gpu config from status if preferred nodes provided (#1308)
- 76e099e fix: avoid extra node creation on informercache delay (#1311)
Code Refactoring 💎
- d2ac059 refactor: adopt generator pattern in fine-tuning part (#1292)
- aea0a42 refactor: introduce manifest generator -- part 1 (#1284)
Continuous Integration 💜
- 461f75c ci: update release branch prefix to 'release-' (#1371)
- a42040d ci: Expand trivy scanning to other images (#1161)
Documentation 📘
- 5fe9a44 docs: add release management (#1360)
- ca4b6e2 docs: add chat/completions rag docs and split install/api docs (#1334)
- d50af9d docs: Document Model-As-OCI-Artifacts feature (#1359)
- 3ea17ff docs: fix ConfigMap creation sequence in example docs (#1348)
- fa41991 docs: add aikit to integrations (#1303)
- 1112673 docs: Kaito kubectl cli proposal (#1230)
- 5e750b8 docs: Update documentation to use chat completions API instead of deprecated completions API (#1340)
- 4a75426 docs: verify docs site with Algolia (#1328)
- 336d7be docs: add Headlamp-KAITO to documentation (#1314)
- 03d8665 docs: add search to website using Algolia (#1302)
- ff93426 docs: update installation docs to support different cloud providers (#1247)
- fca7291 docs: publish v0.5.1 docs (#1289)
Maintenance 🔧
- 4ad29f8 chore: bump base image to 0.0.5 (#1364)
- 91b38b1 chore: bump actions/setup-go from 5.4.0 to 5.5.0 (#1345)
- 5acd3c7 chore: rename Go files to use underscore file naming convention (#1361)
- 0389855 chore: fix references to yaml files and rename bugs (#1347)
- 152c0e8 chore: rename .yml to .yaml extension in GH actions for consistency (#1339)
- f00cad7 chore: bump actions/cache from 4.2.2 to 4.2.3 (#1344)
- db679e8 chore: Revert the model used in the rag e2e test back to Phi-3 (#1342)
- ea6216f chore: update node sku for e2e tests (#1341)
- af43eb5 chore: remove unnecessary sleep in test (#1332)
- fc4a847 chore: bump actions/setup-node from 4.3.0 to 4.4.0 (#1324)
- 8c7ac49 chore: reduce verbose logs and unnecessary reconcile (#1312)
- 725da54 chore: bump starlette from 0.40.0 to 0.47.2 in /presets/workspace/dependencies (#1290)
- 3f85a58 chore: bump step-security/harden-runner from 2.12.2 to 2.13.0 (#1287)
v0.5.1
v0.5.1 - 2025-07-21
Changelog
Features 🌈
Bug Fixes 🐞
- b71e7d3 fix: check nv plugin only for known gpu sku (#1275)
- 2ed2d23 fix: fix deploy-docs.yml (#1259)
- dc8576a fix: helm: StorageClass upgrades & rollbacks idempotent (#1242)
- 0a7b40e fix: correct kaito-rag-service mcr image registry
- 03037fe fix: set individual charts path for helm chart release
- ae6edb7 fix: specify helm charts target in pipeline (#1235)
Documentation 📘
- 7db5a6b docs: add versioned docs (#1281)
- d8fa8c5 docs: add proposal for Gateway API Inference Extension (#1274)
- db62cad docs: add comprehensive multi-node inference documentation (#1270)
- 48dd41e docs: Update README.md with meeting notes link (#1272)
- 8bd1ce6 docs: update README.md (#1258)
- 57b6cf1 docs: updage RAG Docs to remove rerank options (#1255)
- d40ab2c docs: update community meeting info (#1254)
- 5f3e80d docs: add keda scaler for kaito workloads proposal (#1237)
- cefc33c docs: update owners (#1250)
- 03332f9 docs: add tool calling and MCP examples (#1246)
- 0eeef81 docs: add explanation for max-model-len requirement (#1239)
Maintenance 🔧
- 62de3ec chore: skip flaky test in ragengine e2e
- e915956 chore: bump local-csi-driver to v0.2.3 (#1285)
- e78b0fc chore: bump on-headers and compression in /website (#1283)
- 589cd14 chore: use tool-enabled chat templates when possible (#1278)
- 57341e1 chore: bump requests from 2.32.3 to 2.32.4 in /presets/workspace/dependencies (#1174)
- 4309784 chore: bump llama-index from 0.12.38 to 0.12.41 in /presets/ragengine (#1253)
- 12a1ff4 chore: bump step-security/harden-runner from 2.11.0 to 2.12.2 (#1229)
- 3490ce7 chore: switch project license to Apache 2 (#1260)
- cb21ff3 chore: bump terraform provider versions and add kaito-ragengine chart deployment (#1240)
- 55607c8 chore: add ArtifactHub annotations to Chart.yaml (#1219)
- c276769 chore: bump k8s.io/* deps -> v0.33.2, controller-runtime -> v0.21.0 (#1227)
Performance Improvements 🚀
v0.5.0
v0.5.0 - 2025-07-03
This release includes these major changes:
- First official release of Retrieval Augmented Generation (RAG) support.
- Added vLLM-based distributed inference support, with more popular large models coming soon.
- Leveraged local-csi-driver plugin for NVMe disk support.
- Used OCI artifact to distribute preset model images, achieving 50% faster image pulls.
Changelog
Features 🌈
- 0cba258 feat: Update Workflows for RAG Release (#1216)
- 8fac718 feat: add local cache for model files (#1203)
- 715437e feat: Add RagEngine README.md (#1204)
- 6f62b72 feat: pull model artifacts lively (#1188)
- 010e136 feat: update model image to 0.2.0 oci artifacts (#1181)
- 21a00ac feat: adding CURL E2E tests for update/delete doc and delete index (#1182)
- f177226 feat: update rag engine image url to be build from env (#1175)
- 8c03221 feat: add doc_id to /query result nodes (#1170)
- 0b1fa38 feat: RAG metrics part 2 (#1164)
- 776f180 feat: Adding delete index api (#1165)
- 2ddb2a7 feat: support adapters for StatefulSet workloads (#1158)
- e6213e4 feat: add proposal for Llama-3.3-70B-Instruct model support (#1159)
- c2ebdb0 feat: support volume for tuning (#1118)
- 18e5c17 feat: RAG metrics part 1 (#988)
- 9976cda feat: bump base image to 0.0.3 (#1138)
- 1f30f0f feat: support multi-node distributed inference for vLLM (#949)
- e90e366 feat: adding metadata filtering on list documents api (#1114)
- 3415ac7 feat: Add CustomTransformer for Code Splitting support (#1131)
- 140a5f9 feat: update gpu-provisioner version to v0.3.5 for kaito (#1128)
Bug Fixes 🐞
- f9ab07c fix: specify helm charts target in pipeline
- 8f6f312 fix: adapt to vllm 0.9.0 (#1187)
- 25f99ab fix: require max-model-len for distributed inference (#1177)
- 0f52166 fix: Pin requirements to prevent versioning errors (#1156)
- 1c72a5a fix: pin the pytest-asyncio package (#1153)
- cbd0690 fix: customize gpu-provisioner deployment name for e2e test (#1144)
- 76497d1 fix: revert dependabot.yml changes (#1141)
Documentation 📘
- b138086 docs: adding Docs for RAGEngine (#1226)
- b1ae546 docs: fix broken img link (#1220)
- 2f5f7d8 docs: add proposal for supporting scale subresource api of workspace (#1184)
- 12dfacd docs: change machine to nodeclaim (#1193)
- 0d9fa90 docs: fix broken links for presets (#1196)
- d5648ee docs: add website and move docs (#1183)
- 7f2a036 docs: add proposal for Distributing LLM Model Files as OCI Artifacts (#1169)
Maintenance 🔧
- 9326178 chore: skip flaky test in ragengine e2e
- d0cb21f chore: bump local-csi-driver to 0.2.0 (#1224)
- e7485ba chore: converge Errors and update tests (#1194)
- 89c5850 chore: use FIPS compliance image for oras tool (#1217)
- 13540bc chore: bump vllm from 0.8.5 to 0.9.0 in /presets/workspace/dependencies (#1154)
- 268b128 chore: Create kaito_configmap_tuning_phi_3.yaml (#1146)
- db334a2 chore: bump codecov/codecov-action from 5.4.2 to 5.4.3 in the all-action group (#1134)
- dbcfbb8 chore: merge dependabot pr into groups (#1087)
- 1dee277 chore: bump actions/dependency-review-action from 4.6.0 to 4.7.0 (#1119)
Performance Improvements 🚀
Testing 💚
- fb55369 test: adapt preset test to oci artifacts (#1186)
- 18afc28 test: add e2e test case for multi-node distributed inference (#1163)
- 16f9a27 test: add support volume for tuning e2e (#1139)
- 6f60f59 test: Exclude gpu-provisioner check in case of running e2e on managed mode (#1155)
- c5d5a5f test: improve ut coverage to ragengine (#1157)
v0.4.6
v0.4.6 - 2025-05-14
This release includes these major changes:
- Added support for Llama 3 models.
- Added support for tool calling.
- Deprecate Llama 2 Models.
- Improved Faiss-based RAG engine with update/delete APIs.
- Fixed node plugin issues and corrected memory/storage settings for GPUs.
Changelog
Features 🌈
- 45ba61e feat: bump service image to 0.1.1 (#1117)
- a16fd08 feat: add update/delete document api's and handling for faiss (#1090)
- 382f476 feat: update GPU provisioner version and model OS disk storage (#1097)
- 50d321a feat: Ensure shm volume added (#1102)
- 5d431a8 feat: Cleanup Deprecated ImageAccess Field (#1100)
- 1b0a214 feat: Update pipeline to handle downloadAtRuntime models (#1095)
- 3668eb0 feat: strip managed fields for controller informer (#1086)
- 346e6a2 feat: Deprecate Llama 2 Models and Add Llama3 (#1091)
- 0512a5d feat: support preset model weight downloads (#1035)
- 6890fc0 feat: Ensure latest security patches applied on debian packages (#1045)
- 73693bb feat: parse supported_models.yaml in plugin pkg before starting manager (#1044)
- 7adf0d7 feat: Add more fine-tuning examples (#1046)
- bb4635c feat: add base image build target in Dockerfile (#1032)
- 2454918 feat: Update README.md (#1038)
- 0e6f2f1 feat: Update API version from v1alpha1 to v1beta1 (#1036)
Bug Fixes 🐞
- 80bb0dc fix: move nodeclaim check to right place (#1062)
- c6abfb1 fix: remove PYTORCH_CUDA_ALLOC_CONF environment variable (#1120)
- f6a1f15 fix: update arc GPUMem to 32 and add more examples for arc (#1111)
- 8ed0ec5 fix: Troubleshooting Target Modules Error (#1096)
- b8f501d fix: update GPUMemGB for Standard_ND96amsr_A100_v4 (#1098)
- ad90c48 fix: change ephemeral storage to resource storage in nodeclaim (#1094)
- f724024 fix: delete unused pod indexer (#1089)
- 0201daf fix: ensureNodePlugins does not work (#1070)
- 9798552 fix: guard nodeclass op by featuregate (#1079)
- 80268c0 fix: document id's on list document responses are the same as document id's on create document responses (#1080)
- 797ffdd fix: ragengine index document returns wrong ids (#1071)
- 34d14d3 fix: should not check preset model in sync loop (#1058)
- 76771c6 fix: only add finalizer when the object is not in deletion state (#1057)
- 1efa0a7 fix: Dots not allowed in workspace name
- a00d73e fix: add registry into release pipeline (#1026) (#1028)
Code Refactoring 💎
Documentation 📘
- 9bbc707 docs: add tool calling example (#1125)
- 782f4a7 docs: expand on container probes in 20250325-distributed-inference.md (#1105)
- 039437a docs: add proposal for distributed inference (#1092)
- 9edff8d docs: Update README for new release (#1033)
Maintenance 🔧
- ad79ab5 chore: update tool chat templates (#1116)
- eba517f chore: bump vllm from 0.8.2 to 0.8.5 in /presets/workspace/dependencies (#1081)
- b7b5b19 chore: remove torchrun and distributed inference references (#1108)
- 515ab26 chore: bump step-security/harden-runner from 2.11.1 to 2.12.0 (#1072)
- 9fe6ac3 chore: bump goreleaser/goreleaser-action from 6.2.1 to 6.3.0 (#965)
- 7eb1def chore: bump codecov/codecov-action from 5.4.0 to 5.4.2 (#1031)
- 10af1f4 chore: simplify the manifests generation code and add ut (#1048)
- 03dd1d7 chore: add license header check for Python files (#1075)
- 7da6e3b chore: bump golang.org/x/net from 0.36.0 to 0.38.0 (#1022)
- eff7a70 chore: avoid using goto by using retry.OnError (#1049)
- 9a83501 chore: update RAG doc for avoiding conflicts (#1055)
- cc84082 chore: doc for workspace controller metrics (#1029)
Testing 💚
v0.4.5
v0.4.5 - 2025-04-18
This release includes these major changes:
- Bump workspace API to v1beta1.
- Added workload count metrics for KAITO workspace.
- Added better support to prevent OOM.
- Added Phi-4 and Qwen 32B models.
- Reimplemented the pull and push mechanisms for images, enhancing reliability.
Changelog
Features 🌈
- 9cb9a9c feat: Add workload count metrics for kaito workspace (#1020)
- 53a5aa2 Revert "feat: Provide default chat templates for Falcon and Phi-2 fine-tuning" (#1017)
- 35d2e48 feat: Provide default chat templates for Falcon and Phi-2 fine-tuning (#1015)
- 937fa4a feat: add vllm adapter strength validation (#1009)
- 08cd100 feat: enable reasoning output for deepseek model (#1003)
- a33b8d5 feat: reserve enough vram buffer for vllm service (#990)
- 8f36cc8 feat: Enhance vLLM Configuration Validation (#979)
- a5e5382 feat: Add controller tags (#996)
- 81cd242 feat: check lora support availability (#997)
- 8730b2c feat: bump service image tag (#970)
- 85e0326 feat: Add Annotation Bypass GPU Mem check (#986)
- 1bd1766 feat: reimplement pull and push of images with skopeo and oras (#950)
- 7570d53 feat: Add Pytorch Expandable segments envvar (#980)
- 423f2c7 feat: Add Phi-4 and Qwen 32B models (#978)
- 4a0e4de Revert "feat: Update precommit hooks"
- 0094722 Revert "feat: Add github action"
- 359309f feat: Add github action
- 87b2437 feat: Update precommit hooks
- d4a2cdd feat: update gpu-provisioner version to v0.3.3 for kaito (#972)
- 416e330 feat: skip nodeclaim.Status.LastPodEventTime change event (#963)
- 0cc1ab4 feat: more rapid local development with Tilt (#952)
- e83f3b0 feat: onboard arc instance type (#948)
- 94e0ea9 feat: add probes to template sample (#941)
- 4806322 feat: Updated Custom Deployment Template and Minor Nits (#935)
- ad077e3 feat: update gpu-provisioner version to v0.3.2 for kaito (#917)
- 4aaab3d feat: Add load and persist endpoints tests to RAGEngine E2E (#913)
- 1815428 feat: port preset name validation to v1beta1 webhook (#914)
- fc6c008 feat: RAG remote service secret patch (#862)
- 241f876 feat: Offload Async Coroutine Execution to Separate Thread to Avoid Nested Event Loop Error (#906)
- 381f539 feat: Introduce workspace v1beta1 API (#904)
- 80365b0 feat: add featuregate for ensureNodeClass (#900)
- 0bbd6ef feat: Add overwrite to load index (#897)
- e84cd9b feat: Generalize Embedding Model Class & Add /load Endpoint for Index Management (#892)
- d02fb89 feat: Add Persist RWLock (#891)
- 08817b4 feat: Add Persist Index Endpoint (#889)
- 950b6ed feat: Add Async RWLock for Safe Concurrent Index Operations in Select Vector Stores (#888)
- 8737f52 feat: Make Inference Class Fully Async with HTTPX Requests (#880)
- 2a9b5c5 feat: Add E2E Endpoint Checks (#864)
- 6cb1bee feat: Paginate & Truncate List Documents, Add RAG FastAPI Docstrings, Optimize Multi-Document Retrieval with Async Gather (#847)
Bug Fixes 🐞
- 592c460 fix: allow backoff for tuning job (#1008)
- a8867fb fix: Pin disable chunked prefill on V100 Arch (#971)
- 35e210f fix: add dependencies for vllm runtime (#964)
- fec20c4 fix: Fine-Tuning DataCollator Parsing + BnB Support (#940)
- c8f2b09 fix: Add yq dependency to install instructions (#927)
- a8c25fb fix: Adding Preset name validation in validateCreateWithInference (#905)
- 45e6882 fix: Excluding Metadata to LLM during Response Synthesis & Updating default top_k (#896)
- c85116d fix: Huggingface Request Format (#895)
- 33bc9bb fix: LLMRerank edge cases, HTTPx timeout issues, and async client segfaults (#884)
- 5895b9b fix: skip probing for incompatible scenario (#879)
Documentation 📘
- f7422bf docs: OOM prevention mechanisms used in Kaito (#1016)
- b9fea47 docs: Enhance README with fine-tuning config details (#1018)
- 3338d55 docs: update vllm 0.8.2 metrics (#1021)
- 5f38db0 docs: add custom config usage guide (#987)
- ce53b63 docs: Update reference-image-deployment.yaml (#985)
- aafdb7e docs: add vllm metrics docs (#882)
Maintenance 🔧
- a0eeaf5 chore: update sku config (#1007)
- a2a7f9d chore: refactor interface.go (#1002)
- 95c6809 chore: bump actions/dependency-review-action from 4.5.0 to 4.6.0 (#993)
- 3c847a0 chore: bump azure/login from 2.2.0 to 2.3.0 (#992)
- 9118fba chore: bump step-security/harden-runner from 2.11.0 to 2.11.1 (#991)
- a5c75ea chore: RAG quickstart (#968)
- eb05b16 chore: Bump golang version to 1.24 (#958)
- b11dfcb chore: use run id, number, and attempt to create unique azure resource names (#954)
- d3036bb chore: bump actions/setup-go from 5.2.0 to 5.4.0 (#947)
- db6813f chore: bump github.com/samber/lo from 1.47.0 to 1.49.1 (#845)
- bb235ac chore: bump docker/login-action from 3.3.0 to 3.4.0 (#930)
- 7f6e9e0 chore: bump vllm from 0.7.2 to 0.8.1 in /presets/workspace/dependencies (#945)
- 5baa182 chore: bump golang.org/x/net from 0.33.0 to 0.36.0 (#925)
- 9220320 chore: Add code header (#944)
- 5ad0875 chore: add replacement for k8s.io/cri-client to v0.30.1 (#939)
- c442b6d chore: bump codecov/codecov-action from 5.3.1 to 5.4.0 (#910)
- a109a24 chore: bump step-security/harden-runner from 2.10.2 to 2.11.0 (#893)
- 79d1e3f chore: update priorityClassName (#909)
- a61f6dd chore: RAG pipeline leaking cluster patch (#875)
- c63d7cd chore: bump goreleaser/goreleaser-action from 6.1.0 to 6.2.1 (#887)
- fa14ab8 chore: bump step-security/harden-runner from 2.10.2 to 2.11.0 (#886)
- 6553330 chore: bump vllm to 0.7.2 (#890)
- d6cd2f2 chore: RAG E2E test index and query validation (#871)
- f5fb284 chore: RAG e2e test with Kaito vllm inference (#870)
- 8e66d77 chore: Curb the permissions for publishing data to third party dashboards. (#865)
- 4f04395 chore: bump codecov/codecov-action from 5.1.2 to 5.3.1 (#843)
- d90959b chore: Rag e2e test for remote inference (#854)
Testing 💚
v0.4.4
v0.4.3
v0.4.3 - 2025-01-30
Changelog
Features 🌈
- e333f2a feat: Add DeepSeek READMEs and Example (#851)
- 31bdf47 feat: Add DeepSeek Model for E2E (#850)
- 0ed89f2 feat: Add Deepseek Model (#848)
- 254dec6 feat: Add DeepSeek Model Plugin (#849)
- 979b739 feat: RAG API Server to use Async/Await (#835)
Bug Fixes 🐞
- 872d1a6 fix: use ghcr image for e2e test during release
- 8d3a1e1 fix: Add DeepSeek Qwen E2E (#852)
- f7ee0ec fix: Prevent blocking healthcheck during Inference (#837)
Code Refactoring 💎
Maintenance 🔧
v0.4.2
v0.4.2 - 2025-01-16
Changelog
Features 🌈
- 03bf1b3 feat: upgrade v1beta1.NodeClaim to v1.NodeClaim (#823)
- c731441 feat: upgrade golangci-lint to v1.63.4 (#821)
- 8ad5146 feat: remove machine and aws/karpenter-core dependency from kaito (#806)
- 5351ff8 feat: Add RAG LLMReranker (#784)
- 1567afa feat: Add Qwen Link
Bug Fixes 🐞
- c0f21fb fix: Ensure
model
provided in vLLM inference (#820) - dcc0b3b fix: Update CodeCov Badge
- b7d8dff fix: Fix Repo Filepaths (#809)
- 47143e5 fix: Update RAG ChromaDB UTs v0.6.1 (#803)
Continuous Integration 💜
Documentation 📘
Maintenance 🔧
- fd8cead chore: default to NC_A100_v4 series gpu (#825)
- d2b3ccd chore: terraform updates (#812)
- 6a98a01 chore: bump github.com/onsi/ginkgo/v2 from 2.22.1 to 2.22.2 (#800)
- 5f914ad chore: bump github.com/stretchr/testify from 1.9.0 to 1.10.0 (#787)