The latest release of the Haystack OSS LLM framework adds a long-requested feature: image support!
๐ Notebooks below
This isn't just about passing images to an LLM. We built several features to enable practical multimodal use cases.
What's new? ๐ง Support for multiple LLM providers: OpenAI, Amazon Bedrock, Google Gemini, Mistral, NVIDIA, OpenRouter, Ollama and more (support for Hugging Face API coming ๐) ๐๏ธ Prompt template language to handle structured inputs, including images ๐ PDF and image converters ๐ Image embedders using CLIP-like models ๐งพ LLM-based extractor to pull text from images ๐งฉ Components to build multimodal RAG pipelines and Agents
I had the chance of leading this effort with @sjrhuschlee (great collab).