Ollama support

Based on the findings of @ptelang , docker is not able tap Apple M1/2/3 based GPUs. However with llama.cpp now implementing a very-fast CPU-accelerated Q4_0_4_4 quantized inference inference of mode sub 5 billion parameters models seems quite viable.

However its quite possible some will still want to use OLLama for running very large models, or simply because it is there prefered way of managing multiple models. We should therefore extend inference support towards Ollama.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ollama support #70

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ollama support #70

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions