Image captioning with a locally stored Large Language Model (LLM)

This example generates a caption of an image.

It runs fully local on your computer and it does not require a Graphics Processing Unit (GPU).

It uses the Salesforce BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Large Language Model (LLM) and Hugging Face Transformers. Please note that is a very small model and its capabilities are therefore limited, but the results are still very impressive for its size.

This example uses this test image:

to automatically generate this image caption:

a set of toy cars and traffic cones

Name		Name	Last commit message	Last commit date
Latest commit History 506 Commits
.github		.github
blip-image-captioning-base		blip-image-captioning-base
.gitattributes		.gitattributes
.gitignore		.gitignore
image-captioning.py		image-captioning.py
readme.md		readme.md
requirements.txt		requirements.txt
test_image.jpg		test_image.jpg

Provide feedback