LLaVaVision

A simple "Be My Eyes" web app with a llama.cpp/llava backend created in about an hour using ChatGPT, Copilot, and some minor help from me, @lxe . It describes what it sees using SkunkworksAI BakLLaVA-1 model via llama.cpp and narrates the text using Web Speech API .

Inspired by Fuzzy-Search/realtime-bakllava .

Getting Started

You will need a machine with about ~5 GB of RAM/VRAM for the q4_k version.

Set up the llama.cpp server

(Optional) Install the CUDA toolkit:

sudo apt install nvidia-cuda-toolkit

Build llama.cpp (build instructions for various platforms at llama.cpp build ):

git clone https://github.com/ggerganov/llama.cpp
cd
 llama.cpp
mkdir build
cd
 build
cmake .. -DLLAMA_CUBLAS=ON 
#
 Remove the flag if CUDA is unavailable

cmake --build 
.
 --config Release

Download the models from ggml_bakllava-1 :

wget https://huggingface.co/mys/ggml_bakllava-1/resolve/main/mmproj-model-f16.gguf
wget https://huggingface.co/mys/ggml_bakllava-1/resolve/main/ggml-model-q4_k.gguf 
#
 Choose another quant if preferred

Start the server (server options detailed here ):

./bin/server -m ggml-model-q4_k.gguf --mmproj mmproj-model-f16.gguf -ngl 35 -ts 100,0 
#
 For GPU-only, single GPU

#
 ./bin/server -m ggml-model-q4_k.gguf --mmproj mmproj-model-f16.gguf # For CPU

Launch LLaVaVision

Clone and set up the environment:

git clone https://github.com/lxe/llavavision
cd
 llavavision
python3 -m venv venv
.
 ./venv/bin/activate
pip install -r requirements.txt

Create dummy certificates and start the server. HTTPS is required for mobile video functionality:

openssl req -newkey rsa:4096 -x509 -sha256 -days 365 -nodes -out cert.pem -keyout key.pem
flask run --host=0.0.0.0 --key key.pem --cert cert.pem --debug

Access https://your-machine-ip:5000 from your mobile device. Optionally, start a local tunnel with ngrok or localtunnel:

npx localtunnel --local-https --allow-invalid-cert --port 5000

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
static		static
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
requirements.txt		requirements.txt
screenshot.gif		screenshot.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

static

static

templates

templates

.dockerignore

.dockerignore

.gitignore

.gitignore

Dockerfile

Dockerfile

README.md

README.md

app.py

app.py

docker-compose.yml

docker-compose.yml

entrypoint.sh

entrypoint.sh

requirements.txt

requirements.txt

screenshot.gif

screenshot.gif

Repository files navigation

LLaVaVision

Getting Started

Set up the llama.cpp server

Launch LLaVaVision

Acknowledgements and Inspiration

About

Releases

Packages

Contributors 2

Languages

lxe/llavavision

Folders and files

Latest commit

History

Repository files navigation

LLaVaVision

Getting Started

Set up the llama.cpp server

Launch LLaVaVision

Acknowledgements and Inspiration

About

Topics

Resources

Stars

Watchers

Forks

Languages