Testing a new quantization type in Llama.cpp #13721

khurramusman-10xe · 2025-05-23T08:36:25Z

khurramusman-10xe
May 23, 2025

Hello! I am new to Llama.cpp and don't have a lot of exposure to it as of now. I have a custom quantization technique that (as per my knowledge) can't be directly mapped to one the built-in techniques. Just to give some more context, its a combination of uniform and non-uniform quantization. What I want to do is to define a new quantization type within Llama.cpp and test its inference run time. I already have the various inputs (on a per tensor level) quantized by my custom technique. I want to customize the convert-hf-to-GGUF script to take my quantization output as the input and save them to a format that's compatible with Llama.cpp and then I want to run it. I would probably only need to implement the dequantization for it since its already quantized. I could not really find much on adding a new quant type to Llama.cpp, defining the changes I need to make to store in GGUF format and then dequantizing the custom technique. Can someone guide me where I would need to make these changes and what else should I be looking for?

I do realize that the above description is somewhat vague. Essentially what I am looking for is to add a new quant type to Llama.cpp, storing it in GGUF format and then defining the dequant for it so that I can run its inference within Llama.cpp. If someone has done that previously for their own quant technique, I can try and follow their recipe. Appreciate any help that I can get here -- Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testing a new quantization type in Llama.cpp #13721

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Testing a new quantization type in Llama.cpp #13721

Uh oh!

khurramusman-10xe May 23, 2025

Replies: 0 comments

khurramusman-10xe
May 23, 2025