-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Multimodal Tutorial
GGUF models with vision capabilities are uploaded along a mmproj
file to Hugging Face.
For instance, unsloth/gemma-3-4b-it-GGUF has this:

As an example, download
to your text-generation-webui/user_data/models
folder.
Then download
https://huggingface.co/unsloth/gemma-3-4b-it-GGUF/resolve/main/mmproj-F16.gguf?download=true
to your text-generation-webui/user_data/mmproj
folder. Name it mmproj-gemma-3-4b-it-F16.gguf
to give it a recognizable name.
- Launch the web UI
- Navigate to the Model tab
- Select the GGUF model in the Model dropdown:

- Select the mmproj file in the Multimodal (vision) menu:

- Click "Load"
Select your image by clicking on the 📎 icon and send your message:

The model will reply with great understanding of the image contents:

Multimodal also works with the ExLlamaV3 loader (the non-HF one).
No additional files are necessary, just load a multimodal EXL3 model and send an image.
Examples of models that you can use:
- https://huggingface.co/turboderp/gemma-3-27b-it-exl3
- https://huggingface.co/turboderp/Mistral-Small-3.1-24B-Instruct-2503-exl3
In the page below you can find some ready-to-use examples: