RAG & Ollama's Models GGUF

LLM inference in C/C++

Recent API changes

This script extracts and organizes Ollama model blobs into separate files for use with RAG (Retrieval-Augmented Generation) applications and GGUF model processing. It separates an Ollama model (e.g., Gemma3) into individual components:

model_quant.gguf - The quantized model weights in GGUF format
params.txt - Model parameters and configuration
license.txt - License information
template.txt - Prompt template
config.json - Model configuration in JSON format

Use Cases

RAG Applications: Extract GGUF models for local inference in RAG pipelines
Model Analysis: Separate model components for detailed examination
Custom Deployments: Use extracted components in custom inference setups
llama.cpp Integration: Direct compatibility with llama.cpp ecosystem

⚠️ Important:
This repository DOES NOT include the model weights. You must download them legally from Ollama or another official source.
Use of the models and derivatives is subject to the Gemma Terms of Use.

Requirements

Bash shell
jq for JSON processing
cmake and build tools (for llama.cpp integration)
libcurl4-openssl-dev (dependency for network operations)

Installation & Setup

1. Install Dependencies

# Update package list
sudo apt-get update

# Install all build dependencies
sudo apt-get install cmake build-essential clang libcurl4-openssl-dev jq

# Verify Clang installation
clang --version

2. Clone llama.cpp Repository

# Clone llama.cpp repository (official GGML organization repo)
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp

3. Compile llama.cpp

# Create build directory
mkdir build
cd build

# Configure with Clang as compiler
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ..

# Build the project
cmake --build . --config Release

4. Extract Ollama Model

# Navigate back to your project directory (if you were in llama.cpp/build)
cd /path/to/your/project

# Make the script executable
chmod +x extract_blobs.sh

# Run the extraction
./extract_blobs.sh

Using Extracted Models

Once you have extracted the model files, you can use them with llama.cpp tools:

# Run inference with the extracted GGUF model
./llama.cpp/build/bin/llama-cli -m ~/model_quant_folder/model_quant.gguf -p "Hello, how are you?"

# Start an OpenAI-compatible API server
./llama.cpp/build/bin/llama-server -m ~/model_quant_folder/model_quant.gguf --port 8080

# Run in conversation mode
./llama.cpp/build/bin/llama-cli -m ~/model_quant_folder/model_quant.gguf -cnv

Integration with RAG Applications

The extracted GGUF model can be integrated into RAG pipelines:

Local Inference: Use the GGUF model for local text generation in RAG systems
Embedding Generation: Extract embeddings for document indexing and retrieval
Custom Templates: Utilize the extracted template.txt for consistent prompt formatting
API Integration: Connect RAG applications to the llama-server endpoint

Name		Name	Last commit message	Last commit date
Latest commit History 6,356 Commits
.devops		.devops
.github		.github
ci		ci
cmake		cmake
common		common
examples		examples
ggml		ggml
gguf-py		gguf-py
grammars		grammars
include		include
models		models
pocs		pocs
requirements		requirements
scripts		scripts
src		src
tests		tests
tools		tools
vendor		vendor
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.ecrc		.ecrc
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build-xcframework.sh		build-xcframework.sh
convert_hf_to_gguf.py		convert_hf_to_gguf.py
convert_hf_to_gguf_update.py		convert_hf_to_gguf_update.py
convert_llama_ggml_to_gguf.py		convert_llama_ggml_to_gguf.py
convert_lora_to_gguf.py		convert_lora_to_gguf.py
flake.lock		flake.lock
flake.nix		flake.nix
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG & Ollama's Models GGUF

Recent API changes

Use Cases

Requirements

Installation & Setup

1. Install Dependencies

2. Clone llama.cpp Repository

3. Compile llama.cpp

4. Extract Ollama Model

Using Extracted Models

Integration with RAG Applications

Troubleshooting

Common Build Issues

Model Extraction Issues

About

Uh oh!

Releases

Packages

Languages

License

carcruz97/llama.cpp

Folders and files

Latest commit

History

Repository files navigation

RAG & Ollama's Models GGUF

Recent API changes

Use Cases

Requirements

Installation & Setup

1. Install Dependencies

2. Clone llama.cpp Repository

3. Compile llama.cpp

4. Extract Ollama Model

Using Extracted Models

Integration with RAG Applications

Troubleshooting

Common Build Issues

Model Extraction Issues

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages