Skip to content

Releases: kaito-project/kaito

v0.6.2

11 Sep 03:40
5e41f46
Compare
Choose a tag to compare

v0.6.2 - 2025-09-11

Changelog

Bug Fixes 🐞

  • ac8baa5 fix: ensure a non-empty volumnMount is appended in puller containers (#1487)

v0.6.1

03 Sep 22:44
v0.6.1
2711756
Compare
Choose a tag to compare

v0.6.1 - 2025-09-03

Changelog

Maintenance 🔧

  • 6c01aac chore: bump local-csi-driver v0.2.3 -> v0.2.4 -> v0.2.5 (#1468)

v0.6.0

08 Aug 07:51
461f75c
Compare
Choose a tag to compare

v0.6.0 - 2025-08-08

This release includes these major changes:

  • Added support for DeepSeek-R1/DeepSeek-V3 models.
  • Added /v1/chat/completions API for RAGEngine.
  • Provided better UX for preferred nodes and cpu nodes.
  • Updated documentation with new features, integrations.
  • Added NVIDIA A10 GPU to the supported SKUs.

Changelog

Features 🌈

  • 253d6aa feat: add /v1/chat/completions API for RAGEngine (#1277)
  • dc46418 feat: add deepseek-r1/deepseek-v3 model (#1251)
  • bfd271a feat: adding RAGEngine CRD shortName and ServiceReady status column (#1336)
  • 45cbbf1 feat: support Preferred node in RAG (#1327)
  • ee0c3d0 feat: add make help target to Makefile (#1248)

Bug Fixes 🐞

  • e39f82f fix: pin phi2 to vllm v0 (#1369)
  • 8f4fa75 fix: fix bug where fetch GPU count was failing and defaulting (#1338)
  • 06f4cbd fix: resolve pydantic deprecation warnings (#1317)
  • 78ef22d fix: image link error in scaler proposals (#1318)
  • 2163923 fix: get gpu config from status if preferred nodes provided (#1308)
  • 76e099e fix: avoid extra node creation on informercache delay (#1311)

Code Refactoring 💎

  • d2ac059 refactor: adopt generator pattern in fine-tuning part (#1292)
  • aea0a42 refactor: introduce manifest generator -- part 1 (#1284)

Continuous Integration 💜

  • 461f75c ci: update release branch prefix to 'release-' (#1371)
  • a42040d ci: Expand trivy scanning to other images (#1161)

Documentation 📘

  • 5fe9a44 docs: add release management (#1360)
  • ca4b6e2 docs: add chat/completions rag docs and split install/api docs (#1334)
  • d50af9d docs: Document Model-As-OCI-Artifacts feature (#1359)
  • 3ea17ff docs: fix ConfigMap creation sequence in example docs (#1348)
  • fa41991 docs: add aikit to integrations (#1303)
  • 1112673 docs: Kaito kubectl cli proposal (#1230)
  • 5e750b8 docs: Update documentation to use chat completions API instead of deprecated completions API (#1340)
  • 4a75426 docs: verify docs site with Algolia (#1328)
  • 336d7be docs: add Headlamp-KAITO to documentation (#1314)
  • 03d8665 docs: add search to website using Algolia (#1302)
  • ff93426 docs: update installation docs to support different cloud providers (#1247)
  • fca7291 docs: publish v0.5.1 docs (#1289)

Maintenance 🔧

  • 4ad29f8 chore: bump base image to 0.0.5 (#1364)
  • 91b38b1 chore: bump actions/setup-go from 5.4.0 to 5.5.0 (#1345)
  • 5acd3c7 chore: rename Go files to use underscore file naming convention (#1361)
  • 0389855 chore: fix references to yaml files and rename bugs (#1347)
  • 152c0e8 chore: rename .yml to .yaml extension in GH actions for consistency (#1339)
  • f00cad7 chore: bump actions/cache from 4.2.2 to 4.2.3 (#1344)
  • db679e8 chore: Revert the model used in the rag e2e test back to Phi-3 (#1342)
  • ea6216f chore: update node sku for e2e tests (#1341)
  • af43eb5 chore: remove unnecessary sleep in test (#1332)
  • fc4a847 chore: bump actions/setup-node from 4.3.0 to 4.4.0 (#1324)
  • 8c7ac49 chore: reduce verbose logs and unnecessary reconcile (#1312)
  • 725da54 chore: bump starlette from 0.40.0 to 0.47.2 in /presets/workspace/dependencies (#1290)
  • 3f85a58 chore: bump step-security/harden-runner from 2.12.2 to 2.13.0 (#1287)

v0.5.1

21 Jul 18:45
v0.5.1
62de3ec
Compare
Choose a tag to compare

v0.5.1 - 2025-07-21

Changelog

Features 🌈

  • 10532a7 feat: expose kaito supported models via configmap (#1265)

Bug Fixes 🐞

  • b71e7d3 fix: check nv plugin only for known gpu sku (#1275)
  • 2ed2d23 fix: fix deploy-docs.yml (#1259)
  • dc8576a fix: helm: StorageClass upgrades & rollbacks idempotent (#1242)
  • 0a7b40e fix: correct kaito-rag-service mcr image registry
  • 03037fe fix: set individual charts path for helm chart release
  • ae6edb7 fix: specify helm charts target in pipeline (#1235)

Documentation 📘

Maintenance 🔧

  • 62de3ec chore: skip flaky test in ragengine e2e
  • e915956 chore: bump local-csi-driver to v0.2.3 (#1285)
  • e78b0fc chore: bump on-headers and compression in /website (#1283)
  • 589cd14 chore: use tool-enabled chat templates when possible (#1278)
  • 57341e1 chore: bump requests from 2.32.3 to 2.32.4 in /presets/workspace/dependencies (#1174)
  • 4309784 chore: bump llama-index from 0.12.38 to 0.12.41 in /presets/ragengine (#1253)
  • 12a1ff4 chore: bump step-security/harden-runner from 2.11.0 to 2.12.2 (#1229)
  • 3490ce7 chore: switch project license to Apache 2 (#1260)
  • cb21ff3 chore: bump terraform provider versions and add kaito-ragengine chart deployment (#1240)
  • 55607c8 chore: add ArtifactHub annotations to Chart.yaml (#1219)
  • c276769 chore: bump k8s.io/* deps -> v0.33.2, controller-runtime -> v0.21.0 (#1227)

Performance Improvements 🚀

  • beeaa7c perf: improve multi nodeclaim provision time (#1243)

v0.5.0

03 Jul 12:55
v0.5.0
9326178
Compare
Choose a tag to compare

v0.5.0 - 2025-07-03

This release includes these major changes:

  • First official release of Retrieval Augmented Generation (RAG) support.
  • Added vLLM-based distributed inference support, with more popular large models coming soon.
  • Leveraged local-csi-driver plugin for NVMe disk support.
  • Used OCI artifact to distribute preset model images, achieving 50% faster image pulls.

Changelog

Features 🌈

Bug Fixes 🐞

  • f9ab07c fix: specify helm charts target in pipeline
  • 8f6f312 fix: adapt to vllm 0.9.0 (#1187)
  • 25f99ab fix: require max-model-len for distributed inference (#1177)
  • 0f52166 fix: Pin requirements to prevent versioning errors (#1156)
  • 1c72a5a fix: pin the pytest-asyncio package (#1153)
  • cbd0690 fix: customize gpu-provisioner deployment name for e2e test (#1144)
  • 76497d1 fix: revert dependabot.yml changes (#1141)

Documentation 📘

Maintenance 🔧

  • 9326178 chore: skip flaky test in ragengine e2e
  • d0cb21f chore: bump local-csi-driver to 0.2.0 (#1224)
  • e7485ba chore: converge Errors and update tests (#1194)
  • 89c5850 chore: use FIPS compliance image for oras tool (#1217)
  • 13540bc chore: bump vllm from 0.8.5 to 0.9.0 in /presets/workspace/dependencies (#1154)
  • 268b128 chore: Create kaito_configmap_tuning_phi_3.yaml (#1146)
  • db334a2 chore: bump codecov/codecov-action from 5.4.2 to 5.4.3 in the all-action group (#1134)
  • dbcfbb8 chore: merge dependabot pr into groups (#1087)
  • 1dee277 chore: bump actions/dependency-review-action from 4.6.0 to 4.7.0 (#1119)

Performance Improvements 🚀

Testing 💚

  • fb55369 test: adapt preset test to oci artifacts (#1186)
  • 18afc28 test: add e2e test case for multi-node distributed inference (#1163)
  • 16f9a27 test: add support volume for tuning e2e (#1139)
  • 6f60f59 test: Exclude gpu-provisioner check in case of running e2e on managed mode (#1155)
  • c5d5a5f test: improve ut coverage to ragengine (#1157)

v0.4.6

14 May 23:15
4e48fa9
Compare
Choose a tag to compare

v0.4.6 - 2025-05-14

This release includes these major changes:

  • Added support for Llama 3 models.
  • Added support for tool calling.
  • Deprecate Llama 2 Models.
  • Improved Faiss-based RAG engine with update/delete APIs.
  • Fixed node plugin issues and corrected memory/storage settings for GPUs.

Changelog

Features 🌈

  • 45ba61e feat: bump service image to 0.1.1 (#1117)
  • a16fd08 feat: add update/delete document api's and handling for faiss (#1090)
  • 382f476 feat: update GPU provisioner version and model OS disk storage (#1097)
  • 50d321a feat: Ensure shm volume added (#1102)
  • 5d431a8 feat: Cleanup Deprecated ImageAccess Field (#1100)
  • 1b0a214 feat: Update pipeline to handle downloadAtRuntime models (#1095)
  • 3668eb0 feat: strip managed fields for controller informer (#1086)
  • 346e6a2 feat: Deprecate Llama 2 Models and Add Llama3 (#1091)
  • 0512a5d feat: support preset model weight downloads (#1035)
  • 6890fc0 feat: Ensure latest security patches applied on debian packages (#1045)
  • 73693bb feat: parse supported_models.yaml in plugin pkg before starting manager (#1044)
  • 7adf0d7 feat: Add more fine-tuning examples (#1046)
  • bb4635c feat: add base image build target in Dockerfile (#1032)
  • 2454918 feat: Update README.md (#1038)
  • 0e6f2f1 feat: Update API version from v1alpha1 to v1beta1 (#1036)

Bug Fixes 🐞

  • 80bb0dc fix: move nodeclaim check to right place (#1062)
  • c6abfb1 fix: remove PYTORCH_CUDA_ALLOC_CONF environment variable (#1120)
  • f6a1f15 fix: update arc GPUMem to 32 and add more examples for arc (#1111)
  • 8ed0ec5 fix: Troubleshooting Target Modules Error (#1096)
  • b8f501d fix: update GPUMemGB for Standard_ND96amsr_A100_v4 (#1098)
  • ad90c48 fix: change ephemeral storage to resource storage in nodeclaim (#1094)
  • f724024 fix: delete unused pod indexer (#1089)
  • 0201daf fix: ensureNodePlugins does not work (#1070)
  • 9798552 fix: guard nodeclass op by featuregate (#1079)
  • 80268c0 fix: document id's on list document responses are the same as document id's on create document responses (#1080)
  • 797ffdd fix: ragengine index document returns wrong ids (#1071)
  • 34d14d3 fix: should not check preset model in sync loop (#1058)
  • 76771c6 fix: only add finalizer when the object is not in deletion state (#1057)
  • 1efa0a7 fix: Dots not allowed in workspace name
  • a00d73e fix: add registry into release pipeline (#1026) (#1028)

Code Refactoring 💎

  • 56c01bb refactor: remove fields from PresetParam in favor of embedded Metadata (#1074)

Documentation 📘

  • 9bbc707 docs: add tool calling example (#1125)
  • 782f4a7 docs: expand on container probes in 20250325-distributed-inference.md (#1105)
  • 039437a docs: add proposal for distributed inference (#1092)
  • 9edff8d docs: Update README for new release (#1033)

Maintenance 🔧

  • ad79ab5 chore: update tool chat templates (#1116)
  • eba517f chore: bump vllm from 0.8.2 to 0.8.5 in /presets/workspace/dependencies (#1081)
  • b7b5b19 chore: remove torchrun and distributed inference references (#1108)
  • 515ab26 chore: bump step-security/harden-runner from 2.11.1 to 2.12.0 (#1072)
  • 9fe6ac3 chore: bump goreleaser/goreleaser-action from 6.2.1 to 6.3.0 (#965)
  • 7eb1def chore: bump codecov/codecov-action from 5.4.0 to 5.4.2 (#1031)
  • 10af1f4 chore: simplify the manifests generation code and add ut (#1048)
  • 03dd1d7 chore: add license header check for Python files (#1075)
  • 7da6e3b chore: bump golang.org/x/net from 0.36.0 to 0.38.0 (#1022)
  • eff7a70 chore: avoid using goto by using retry.OnError (#1049)
  • 9a83501 chore: update RAG doc for avoiding conflicts (#1055)
  • cc84082 chore: doc for workspace controller metrics (#1029)

Testing 💚

v0.4.5

18 Apr 18:01
cc6ad4d
Compare
Choose a tag to compare

v0.4.5 - 2025-04-18

This release includes these major changes:

  • Bump workspace API to v1beta1.
  • Added workload count metrics for KAITO workspace.
  • Added better support to prevent OOM.
  • Added Phi-4 and Qwen 32B models.
  • Reimplemented the pull and push mechanisms for images, enhancing reliability.

Changelog

Features 🌈

  • 9cb9a9c feat: Add workload count metrics for kaito workspace (#1020)
  • 53a5aa2 Revert "feat: Provide default chat templates for Falcon and Phi-2 fine-tuning" (#1017)
  • 35d2e48 feat: Provide default chat templates for Falcon and Phi-2 fine-tuning (#1015)
  • 937fa4a feat: add vllm adapter strength validation (#1009)
  • 08cd100 feat: enable reasoning output for deepseek model (#1003)
  • a33b8d5 feat: reserve enough vram buffer for vllm service (#990)
  • 8f36cc8 feat: Enhance vLLM Configuration Validation (#979)
  • a5e5382 feat: Add controller tags (#996)
  • 81cd242 feat: check lora support availability (#997)
  • 8730b2c feat: bump service image tag (#970)
  • 85e0326 feat: Add Annotation Bypass GPU Mem check (#986)
  • 1bd1766 feat: reimplement pull and push of images with skopeo and oras (#950)
  • 7570d53 feat: Add Pytorch Expandable segments envvar (#980)
  • 423f2c7 feat: Add Phi-4 and Qwen 32B models (#978)
  • 4a0e4de Revert "feat: Update precommit hooks"
  • 0094722 Revert "feat: Add github action"
  • 359309f feat: Add github action
  • 87b2437 feat: Update precommit hooks
  • d4a2cdd feat: update gpu-provisioner version to v0.3.3 for kaito (#972)
  • 416e330 feat: skip nodeclaim.Status.LastPodEventTime change event (#963)
  • 0cc1ab4 feat: more rapid local development with Tilt (#952)
  • e83f3b0 feat: onboard arc instance type (#948)
  • 94e0ea9 feat: add probes to template sample (#941)
  • 4806322 feat: Updated Custom Deployment Template and Minor Nits (#935)
  • ad077e3 feat: update gpu-provisioner version to v0.3.2 for kaito (#917)
  • 4aaab3d feat: Add load and persist endpoints tests to RAGEngine E2E (#913)
  • 1815428 feat: port preset name validation to v1beta1 webhook (#914)
  • fc6c008 feat: RAG remote service secret patch (#862)
  • 241f876 feat: Offload Async Coroutine Execution to Separate Thread to Avoid Nested Event Loop Error (#906)
  • 381f539 feat: Introduce workspace v1beta1 API (#904)
  • 80365b0 feat: add featuregate for ensureNodeClass (#900)
  • 0bbd6ef feat: Add overwrite to load index (#897)
  • e84cd9b feat: Generalize Embedding Model Class & Add /load Endpoint for Index Management (#892)
  • d02fb89 feat: Add Persist RWLock (#891)
  • 08817b4 feat: Add Persist Index Endpoint (#889)
  • 950b6ed feat: Add Async RWLock for Safe Concurrent Index Operations in Select Vector Stores (#888)
  • 8737f52 feat: Make Inference Class Fully Async with HTTPX Requests (#880)
  • 2a9b5c5 feat: Add E2E Endpoint Checks (#864)
  • 6cb1bee feat: Paginate & Truncate List Documents, Add RAG FastAPI Docstrings, Optimize Multi-Document Retrieval with Async Gather (#847)

Bug Fixes 🐞

  • 592c460 fix: allow backoff for tuning job (#1008)
  • a8867fb fix: Pin disable chunked prefill on V100 Arch (#971)
  • 35e210f fix: add dependencies for vllm runtime (#964)
  • fec20c4 fix: Fine-Tuning DataCollator Parsing + BnB Support (#940)
  • c8f2b09 fix: Add yq dependency to install instructions (#927)
  • a8c25fb fix: Adding Preset name validation in validateCreateWithInference (#905)
  • 45e6882 fix: Excluding Metadata to LLM during Response Synthesis & Updating default top_k (#896)
  • c85116d fix: Huggingface Request Format (#895)
  • 33bc9bb fix: LLMRerank edge cases, HTTPx timeout issues, and async client segfaults (#884)
  • 5895b9b fix: skip probing for incompatible scenario (#879)

Documentation 📘

Maintenance 🔧

  • a0eeaf5 chore: update sku config (#1007)
  • a2a7f9d chore: refactor interface.go (#1002)
  • 95c6809 chore: bump actions/dependency-review-action from 4.5.0 to 4.6.0 (#993)
  • 3c847a0 chore: bump azure/login from 2.2.0 to 2.3.0 (#992)
  • 9118fba chore: bump step-security/harden-runner from 2.11.0 to 2.11.1 (#991)
  • a5c75ea chore: RAG quickstart (#968)
  • eb05b16 chore: Bump golang version to 1.24 (#958)
  • b11dfcb chore: use run id, number, and attempt to create unique azure resource names (#954)
  • d3036bb chore: bump actions/setup-go from 5.2.0 to 5.4.0 (#947)
  • db6813f chore: bump github.com/samber/lo from 1.47.0 to 1.49.1 (#845)
  • bb235ac chore: bump docker/login-action from 3.3.0 to 3.4.0 (#930)
  • 7f6e9e0 chore: bump vllm from 0.7.2 to 0.8.1 in /presets/workspace/dependencies (#945)
  • 5baa182 chore: bump golang.org/x/net from 0.33.0 to 0.36.0 (#925)
  • 9220320 chore: Add code header (#944)
  • 5ad0875 chore: add replacement for k8s.io/cri-client to v0.30.1 (#939)
  • c442b6d chore: bump codecov/codecov-action from 5.3.1 to 5.4.0 (#910)
  • a109a24 chore: bump step-security/harden-runner from 2.10.2 to 2.11.0 (#893)
  • 79d1e3f chore: update priorityClassName (#909)
  • a61f6dd chore: RAG pipeline leaking cluster patch (#875)
  • c63d7cd chore: bump goreleaser/goreleaser-action from 6.1.0 to 6.2.1 (#887)
  • fa14ab8 chore: bump step-security/harden-runner from 2.10.2 to 2.11.0 (#886)
  • 6553330 chore: bump vllm to 0.7.2 (#890)
  • d6cd2f2 chore: RAG E2E test index and query validation (#871)
  • f5fb284 chore: RAG e2e test with Kaito vllm inference (#870)
  • 8e66d77 chore: Curb the permissions for publishing data to third party dashboards. (#865)
  • 4f04395 chore: bump codecov/codecov-action from 5.1.2 to 5.3.1 (#843)
  • d90959b chore: Rag e2e test for remote inference (#854)

Testing 💚

v0.4.4

31 Jan 06:39
f1e77a2
Compare
Choose a tag to compare

v0.4.4 - 2025-01-31

Changelog

Bug Fixes 🐞

Maintenance 🔧

  • 6c118fe chore: bump gpu-provisioner in terraform sample (#860)
  • 07077f8 chore: bumping tf provider versions and kaito workspace to 0.4.3 (#858)

v0.4.3

30 Jan 16:47
872d1a6
Compare
Choose a tag to compare

v0.4.3 - 2025-01-30

Changelog

Features 🌈

Bug Fixes 🐞

  • 872d1a6 fix: use ghcr image for e2e test during release
  • 8d3a1e1 fix: Add DeepSeek Qwen E2E (#852)
  • f7ee0ec fix: Prevent blocking healthcheck during Inference (#837)

Code Refactoring 💎

  • 9fe5b86 refactor: add inference manifest template and generation script (#833)

Maintenance 🔧

  • 2dadc2b chore: RAGEngine e2e pipeline (#832)
  • 59583af chore: bump nvidia/k8s-device-plugin to 0.17.0 (#836)
  • c96fb2d chore: Reorg e2e test code to be reused in RAG e2e tests (#834)

v0.4.2

16 Jan 12:54
3755bab
Compare
Choose a tag to compare

v0.4.2 - 2025-01-16

Changelog

Features 🌈

  • 03bf1b3 feat: upgrade v1beta1.NodeClaim to v1.NodeClaim (#823)
  • c731441 feat: upgrade golangci-lint to v1.63.4 (#821)
  • 8ad5146 feat: remove machine and aws/karpenter-core dependency from kaito (#806)
  • 5351ff8 feat: Add RAG LLMReranker (#784)
  • 1567afa feat: Add Qwen Link

Bug Fixes 🐞

Continuous Integration 💜

  • 5ad5cae ci: Make publish helm chart and create release workflows sequential (#814)

Documentation 📘

  • 3d04ba6 docs: Update to use Standard_NC24ads_A100_v4 as default SKU in docs (#818)

Maintenance 🔧

  • fd8cead chore: default to NC_A100_v4 series gpu (#825)
  • d2b3ccd chore: terraform updates (#812)
  • 6a98a01 chore: bump github.com/onsi/ginkgo/v2 from 2.22.1 to 2.22.2 (#800)
  • 5f914ad chore: bump github.com/stretchr/testify from 1.9.0 to 1.10.0 (#787)

Testing 💚

  • b4d9a85 test: Add RAG and other Python UT coverage to the codecov report (#815)