Testing a new quantization type in Llama.cpp #13721
Unanswered
khurramusman-10xe
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello! I am new to Llama.cpp and don't have a lot of exposure to it as of now. I have a custom quantization technique that (as per my knowledge) can't be directly mapped to one the built-in techniques. Just to give some more context, its a combination of uniform and non-uniform quantization. What I want to do is to define a new quantization type within Llama.cpp and test its inference run time. I already have the various inputs (on a per tensor level) quantized by my custom technique. I want to customize the convert-hf-to-GGUF script to take my quantization output as the input and save them to a format that's compatible with Llama.cpp and then I want to run it. I would probably only need to implement the dequantization for it since its already quantized. I could not really find much on adding a new quant type to Llama.cpp, defining the changes I need to make to store in GGUF format and then dequantizing the custom technique. Can someone guide me where I would need to make these changes and what else should I be looking for?
I do realize that the above description is somewhat vague. Essentially what I am looking for is to add a new quant type to Llama.cpp, storing it in GGUF format and then defining the dequant for it so that I can run its inference within Llama.cpp. If someone has done that previously for their own quant technique, I can try and follow their recipe. Appreciate any help that I can get here -- Thanks!
Beta Was this translation helpful? Give feedback.
All reactions