Docs/Vision Models

Vision Models

HammerAI desktop has support for Vision Models, meaning you can send images to LLMs which they will "see" and respond to. To learn more about Vision Models, see this Ollama blog: https://ollama.com/blog/vision-models. To use vision models in HammerAI, follow these instructions:

Step 1

Go to https://ollama.com/search?c=vision and look for a specific Visual Language Model that you'd like. In this example, we'll use LLava 7b.

Step 2

Copy the LLava 7b link.

VLM Step 1

Step 3

Open HammerAI desktop, navigate to Models tab, and paste the copied link to the download model search bar.

VLM Step 2

Step 4

Wait patiently until it finishes pulling and downloading.

VLM Step 3

Step 5

Open up a character chat and choose llava:7b model.

VLM Step 4

Step 6

Drag and drop an image to the chatbox.

IMPORTANT NOTE: You have to drag and drop an image first before typing another message. The image needs to be PNG or JPG. You cannot use WebP.

VLM Step 5

Step 7

Enjoy!

IMPORTANT NOTE: As of right now, most models do not accurately describe images with pinpoint precision. Replies may vary depending on the type of image, including realism, art style, and/or SFW/NSFW content. Currently, the models available do not have any capabilities to react to NSFW images. You cannot regenerate replies after you send an image. You have to repeat Step 6 to generate a different reply.

VLM Step 6

Start Guide

Desktop Settings

Feature Guides

Models

Desktop App

Characters

Create Character

My Characters

Chats

Create Image

My Images

Stories

Write Story

My Stories

Settings

Plans

Leaderboard

Feedback

Status

Docs

Release Notes

FAQ