Llava
LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.
![LLaVa](https://cdn.prod.website-files.com/649f003940a53a75a2e42068/65ca5e8a7e8568e4fbdfd6ef_llava_architecture.jpeg)
Related Articles
No items found.