Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

Add normalizer instead of abusing LiteLLM adapter #106

Merged
merged 2 commits into from
Nov 28, 2024

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Nov 27, 2024

Add normalizer instead of abusing LiteLLM adapter

There were 2 problems with how our classes were structured (there are
more but these two I attempted to solve with this PR):

  1. The adapter was supposed to be used in the litellm based providers
    and was supposed to do the translation using litellm's adapters. But
    we stuffed way too much logic in them and started leaking the logic
    to other providers
  2. Despite litellm using the openaAI format for input and output, other
    providers (llama.cpp and soon to be hosted vllm) don't. We need a way
    to canonicalize them to openAI format.

This PR adds a new module called normalizer that takes some of the work
from the adapter and is only responsible for changing the requests and
replies to the openAI format. This is useful so that our pipelines
always work on openAI format internally, both the current input pipeline
and the output pipeline.

The completion handler now really only does completion (previously it
was a really confusing class that did several things) and the adapter is
not hidden better in the litellmshim module.

To ship the PR faster, there are only two normalizers - OpenAI that just
passes the data through and Anthropic that uses the LiteLLM adapter.
Next, we'll add a llama.cpp normalizer to get rid of the <im_start>
tags and convert them into a properly formatted OpenAI message.

@@ -4,14 +4,18 @@

from codegate.providers.base import BaseProvider
from codegate.providers.llamacpp.completion_handler import LlamaCppCompletionHandler
from codegate.providers.llamacpp.adapter import LlamaCppAdapter
from codegate.providers.llamacpp.normalizer import LLamaCppInputNormalizer, LLamaCppOutputNormalizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, we still need to add codegate.providers.llamacpp.normalizer file with it's classes right? So is that something that is left for a future PR?

Just mentioning it in case probably you forgot to add the files to the PR (has happened to me before)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I had the same question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes and no.

Originally I started working on a llamacpp normalizer, but Pankaj's PR #108 makes it obsolete for the immediate need which was to make it possible to use the pipelines with the llama.cpp provider. I will send a PR to enable them shortly.

The other side of this is that we'll want to support the stacklok hosted instance which uses VLLM and for that we'll need a normalizer. Same for Ollama. So, the work I began today with the normalizer that handles <im_start> will be used for the hosted VLLM provider and then we'll also add an Ollama provider that will also add its own normalizer.

Copy link
Contributor

@aponcedeleonch aponcedeleonch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides a comment for a potentially missing file it looks good. Very nice

@ptelang
Copy link
Contributor

ptelang commented Nov 27, 2024

Looks great! I like the idea of adding normalizer.

@ptelang
Copy link
Contributor

ptelang commented Nov 27, 2024

Next, we'll add a llama.cpp normalizer to get rid of the <im_start>
tags and convert them into a properly formatted OpenAI message.

The llama.cpp backend APIs already support OpenAI format. The issue is with Continue plugin -- it sends the requests in non-OpenAI format with im_start/im_end tags.

Instead of having to translate the requests to the OpenAI formats for llama.cpp, an alternative that I found is to use provider='openai' in Continue configuration. This way, we don't need to implement a normalizer for llama.cpp.

{
      "title": "Llama CPP",
      "provider": "openai",
      "model": "qwen2.5-coder-1.5b-instruct-q5_k_m",
      "apiBase": "http://localhost:8989/llamacpp"
},

I will submit a (small) PR shortly that updates the inferencing code to use OpenAI format input.

There were 2 problems with how our classes were structured (there are
more but these two I attempted to solve with this PR):
1) The adapter was supposed to be used in the litellm based providers
   and was supposed to do the translation using litellm's adapters. But
   we stuffed way too much logic in them and started leaking the logic
   to other providers
2) Despite litellm using the openaAI format for input and output, other
   providers (llama.cpp and soon to be hosted vllm) don't. We need a way
   to canonicalize them to openAI format.

This PR adds a new module called normalizer that takes some of the work
from the adapter and is only responsible for changing the requests and
replies to the openAI format. This is useful so that our pipelines
always work on openAI format internally, both the current input pipeline
and the output pipeline.

The completion handler now really only does completion (previously it
was a really confusing class that did several things) and the adapter is
not hidden better in the litellmshim module.

To ship the PR faster, there are only two normalizers - OpenAI that just
passes the data through and Anthropic that uses the LiteLLM adapter.
Next, we'll add a llama.cpp normalizer to get rid of the `<im_start>`
tags and convert them into a properly formatted OpenAI message.
@jhrozek jhrozek merged commit 40f730e into stacklok:main Nov 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants