DocsVision Models
Vision Models
HammerAI desktop has support for Vision Models, meaning you can send images to LLMs which they will "see" and respond to. To learn more about Vision Models, see this Ollama blog: https://ollama.com/blog/vision-models. To use vision models in HammerAI, follow these instructions:
Step 1
Go to https://ollama.com/search?c=vision and look for a specific Visual Language Model that you'd like. In this example, we'll use LLava 7b.
Step 2
Copy the LLava 7b link.
Step 3
Open HammerAI desktop, navigate to Models tab, and paste the copied link to the download model search bar.
Step 4
Wait patiently until it finishes pulling and downloading.
Step 5
Open up a character chat and choose llava:7b model.
Step 6
Drag and drop an image to the chatbox.
IMPORTANT NOTE: You have to drag and drop an image first before typing another message. The image needs to be PNG or JPG. You cannot use WebP.
Step 7
Enjoy!
IMPORTANT NOTE: As of right now, most models do not accurately describe images with pinpoint precision. Replies may vary depending on the type of image, including realism, art style, and/or SFW/NSFW content. Currently, the models available do not have any capabilities to react to NSFW images. You cannot regenerate replies after you send an image. You have to repeat Step 6 to generate a different reply.