This repository was archived by the owner on Jun 5, 2025. It is now read-only.
This repository was archived by the owner on Jun 5, 2025. It is now read-only.
Ollama support #70
Description
Based on the findings of @ptelang , docker is not able tap Apple M1/2/3 based GPUs. However with llama.cpp now implementing a very-fast CPU-accelerated Q4_0_4_4 quantized inference inference of mode sub 5 billion parameters models seems quite viable.
However its quite possible some will still want to use OLLama for running very large models, or simply because it is there prefered way of managing multiple models. We should therefore extend inference support towards Ollama.
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels