Add normalizer instead of abusing LiteLLM adapter #106

jhrozek · 2024-11-27T16:26:20Z

Add normalizer instead of abusing LiteLLM adapter

There were 2 problems with how our classes were structured (there are
more but these two I attempted to solve with this PR):

The adapter was supposed to be used in the litellm based providers
and was supposed to do the translation using litellm's adapters. But
we stuffed way too much logic in them and started leaking the logic
to other providers
Despite litellm using the openaAI format for input and output, other
providers (llama.cpp and soon to be hosted vllm) don't. We need a way
to canonicalize them to openAI format.

This PR adds a new module called normalizer that takes some of the work
from the adapter and is only responsible for changing the requests and
replies to the openAI format. This is useful so that our pipelines
always work on openAI format internally, both the current input pipeline
and the output pipeline.

The completion handler now really only does completion (previously it
was a really confusing class that did several things) and the adapter is
not hidden better in the litellmshim module.

To ship the PR faster, there are only two normalizers - OpenAI that just
passes the data through and Anthropic that uses the LiteLLM adapter.
Next, we'll add a llama.cpp normalizer to get rid of the <im_start>
tags and convert them into a properly formatted OpenAI message.

aponcedeleonch · 2024-11-27T17:35:33Z

src/codegate/providers/llamacpp/provider.py

@@ -4,14 +4,18 @@

 from codegate.providers.base import BaseProvider
 from codegate.providers.llamacpp.completion_handler import LlamaCppCompletionHandler
-from codegate.providers.llamacpp.adapter import LlamaCppAdapter
+from codegate.providers.llamacpp.normalizer import LLamaCppInputNormalizer, LLamaCppOutputNormalizer


Question, we still need to add codegate.providers.llamacpp.normalizer file with it's classes right? So is that something that is left for a future PR?

Just mentioning it in case probably you forgot to add the files to the PR (has happened to me before)

+1, I had the same question.

So yes and no.

Originally I started working on a llamacpp normalizer, but Pankaj's PR #108 makes it obsolete for the immediate need which was to make it possible to use the pipelines with the llama.cpp provider. I will send a PR to enable them shortly.

The other side of this is that we'll want to support the stacklok hosted instance which uses VLLM and for that we'll need a normalizer. Same for Ollama. So, the work I began today with the normalizer that handles <im_start> will be used for the hosted VLLM provider and then we'll also add an Ollama provider that will also add its own normalizer.

aponcedeleonch

Besides a comment for a potentially missing file it looks good. Very nice

ptelang · 2024-11-27T17:42:33Z

Looks great! I like the idea of adding normalizer.

ptelang · 2024-11-27T18:01:01Z

Next, we'll add a llama.cpp normalizer to get rid of the <im_start>
tags and convert them into a properly formatted OpenAI message.

The llama.cpp backend APIs already support OpenAI format. The issue is with Continue plugin -- it sends the requests in non-OpenAI format with im_start/im_end tags.

Instead of having to translate the requests to the OpenAI formats for llama.cpp, an alternative that I found is to use provider='openai' in Continue configuration. This way, we don't need to implement a normalizer for llama.cpp.

{
      "title": "Llama CPP",
      "provider": "openai",
      "model": "qwen2.5-coder-1.5b-instruct-q5_k_m",
      "apiBase": "http://localhost:8989/llamacpp"
},

I will submit a (small) PR shortly that updates the inferencing code to use OpenAI format input.

There were 2 problems with how our classes were structured (there are more but these two I attempted to solve with this PR): 1) The adapter was supposed to be used in the litellm based providers and was supposed to do the translation using litellm's adapters. But we stuffed way too much logic in them and started leaking the logic to other providers 2) Despite litellm using the openaAI format for input and output, other providers (llama.cpp and soon to be hosted vllm) don't. We need a way to canonicalize them to openAI format. This PR adds a new module called normalizer that takes some of the work from the adapter and is only responsible for changing the requests and replies to the openAI format. This is useful so that our pipelines always work on openAI format internally, both the current input pipeline and the output pipeline. The completion handler now really only does completion (previously it was a really confusing class that did several things) and the adapter is not hidden better in the litellmshim module. To ship the PR faster, there are only two normalizers - OpenAI that just passes the data through and Anthropic that uses the LiteLLM adapter. Next, we'll add a llama.cpp normalizer to get rid of the `<im_start>` tags and convert them into a properly formatted OpenAI message.

aponcedeleonch reviewed Nov 27, 2024

View reviewed changes

aponcedeleonch approved these changes Nov 27, 2024

View reviewed changes

jhrozek added 2 commits November 27, 2024 22:54

Run make all to reformat the code

bffce8a

jhrozek force-pushed the normalizer branch from a12c888 to bffce8a Compare November 27, 2024 21:58

aponcedeleonch approved these changes Nov 28, 2024

View reviewed changes

jhrozek merged commit 40f730e into stacklok:main Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add normalizer instead of abusing LiteLLM adapter #106

Add normalizer instead of abusing LiteLLM adapter #106

Uh oh!

jhrozek commented Nov 27, 2024

Uh oh!

aponcedeleonch Nov 27, 2024

Uh oh!

ptelang Nov 27, 2024

Uh oh!

jhrozek Nov 27, 2024

Uh oh!

aponcedeleonch left a comment

Uh oh!

ptelang commented Nov 27, 2024

Uh oh!

ptelang commented Nov 27, 2024 •

edited

Loading

Uh oh!

Uh oh!

Add normalizer instead of abusing LiteLLM adapter #106

Add normalizer instead of abusing LiteLLM adapter #106

Uh oh!

Conversation

jhrozek commented Nov 27, 2024

Uh oh!

aponcedeleonch Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

ptelang Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

jhrozek Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

aponcedeleonch left a comment

Choose a reason for hiding this comment

Uh oh!

ptelang commented Nov 27, 2024

Uh oh!

ptelang commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ptelang commented Nov 27, 2024 •

edited

Loading