Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

Ollama support #70

Closed
1 of 1 issue completed
Closed
Tech Debt 🏗️
1 of 1 issue completed
@lukehinds

Description

@lukehinds

Based on the findings of @ptelang , docker is not able tap Apple M1/2/3 based GPUs. However with llama.cpp now implementing a very-fast CPU-accelerated Q4_0_4_4 quantized inference inference of mode sub 5 billion parameters models seems quite viable.

However its quite possible some will still want to use OLLama for running very large models, or simply because it is there prefered way of managing multiple models. We should therefore extend inference support towards Ollama.

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions