-
Notifications
You must be signed in to change notification settings - Fork 82
Add normalizer instead of abusing LiteLLM adapter #106
Conversation
@@ -4,14 +4,18 @@ | |||
|
|||
from codegate.providers.base import BaseProvider | |||
from codegate.providers.llamacpp.completion_handler import LlamaCppCompletionHandler | |||
from codegate.providers.llamacpp.adapter import LlamaCppAdapter | |||
from codegate.providers.llamacpp.normalizer import LLamaCppInputNormalizer, LLamaCppOutputNormalizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, we still need to add codegate.providers.llamacpp.normalizer
file with it's classes right? So is that something that is left for a future PR?
Just mentioning it in case probably you forgot to add the files to the PR (has happened to me before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I had the same question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yes and no.
Originally I started working on a llamacpp normalizer, but Pankaj's PR #108 makes it obsolete for the immediate need which was to make it possible to use the pipelines with the llama.cpp provider. I will send a PR to enable them shortly.
The other side of this is that we'll want to support the stacklok hosted instance which uses VLLM and for that we'll need a normalizer. Same for Ollama. So, the work I began today with the normalizer that handles <im_start>
will be used for the hosted VLLM provider and then we'll also add an Ollama provider that will also add its own normalizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides a comment for a potentially missing file it looks good. Very nice
Looks great! I like the idea of adding normalizer. |
The llama.cpp backend APIs already support OpenAI format. The issue is with Continue plugin -- it sends the requests in non-OpenAI format with im_start/im_end tags. Instead of having to translate the requests to the OpenAI formats for llama.cpp, an alternative that I found is to use provider='openai' in Continue configuration. This way, we don't need to implement a normalizer for llama.cpp.
I will submit a (small) PR shortly that updates the inferencing code to use OpenAI format input. |
There were 2 problems with how our classes were structured (there are more but these two I attempted to solve with this PR): 1) The adapter was supposed to be used in the litellm based providers and was supposed to do the translation using litellm's adapters. But we stuffed way too much logic in them and started leaking the logic to other providers 2) Despite litellm using the openaAI format for input and output, other providers (llama.cpp and soon to be hosted vllm) don't. We need a way to canonicalize them to openAI format. This PR adds a new module called normalizer that takes some of the work from the adapter and is only responsible for changing the requests and replies to the openAI format. This is useful so that our pipelines always work on openAI format internally, both the current input pipeline and the output pipeline. The completion handler now really only does completion (previously it was a really confusing class that did several things) and the adapter is not hidden better in the litellmshim module. To ship the PR faster, there are only two normalizers - OpenAI that just passes the data through and Anthropic that uses the LiteLLM adapter. Next, we'll add a llama.cpp normalizer to get rid of the `<im_start>` tags and convert them into a properly formatted OpenAI message.
Add normalizer instead of abusing LiteLLM adapter
There were 2 problems with how our classes were structured (there are
more but these two I attempted to solve with this PR):
and was supposed to do the translation using litellm's adapters. But
we stuffed way too much logic in them and started leaking the logic
to other providers
providers (llama.cpp and soon to be hosted vllm) don't. We need a way
to canonicalize them to openAI format.
This PR adds a new module called normalizer that takes some of the work
from the adapter and is only responsible for changing the requests and
replies to the openAI format. This is useful so that our pipelines
always work on openAI format internally, both the current input pipeline
and the output pipeline.
The completion handler now really only does completion (previously it
was a really confusing class that did several things) and the adapter is
not hidden better in the litellmshim module.
To ship the PR faster, there are only two normalizers - OpenAI that just
passes the data through and Anthropic that uses the LiteLLM adapter.
Next, we'll add a llama.cpp normalizer to get rid of the
<im_start>
tags and convert them into a properly formatted OpenAI message.