Ad image

Cohere’s launches Aya Vision AI with support for 23 languages

11 Min Read

Join our daily and weekly newsletter for the latest updates and exclusive content on industry-leading AI coverage. learn more


Canadian AI startup Cohere is specifically targeting companies launched in 2019, but independent research has so far I struggled To gain a large market share among third-party developers Compare with US model providers exclusively for rivals Not to mention the rise of China’s open source competitor Deepseek, but also Openai and humanity.

However, Cohere continues to strengthen its products. Today, its non-profit research division is AI’s Cohere We have announced the release of our first vision model, Aya Visionintegrating language and vision capabilities, and boasting differentiators of support input in 23 different languages, Cohere spoke in his official blog post, a new open-weight multimodal AI model “half of the world’s population” appeals to a wide range of global audiences.

AYA Vision is designed to enhance your ability to interpret AI images, generate text, and convert visual content into natural language, making multilingual AI more accessible and effective. This is especially useful for businesses and organizations operating in multiple markets around the world with different language preferences.

Available now on the Cohere website and on the AI ​​Code community Hugging my face and Kaggle Under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licenseallowing researchers and developers to use, modify and share freely for non-commercial purposes, as long as appropriate attribution is given.

In addition, Aya Vision is available from WhatsAppallowing users to interact directly with models in a familiar environment.

This, unfortunately, limits its use as an engine for enterprises, paid apps and moneymaking workflows.

I’ll come in 8 billion and 3.2 billion parameter version (The parameters refer to the number of internal settings in the AI ​​model that contains its weights and biases, indicating a more usually more powerful and performant model).

Supports 23 languages ​​and counting

Though the rival’s major AI models can understand text in multiple languages, extending this functionality to a vision-based task is a challenge.

However, AYA Vision overcomes this by allowing users to generate image captions, answer visual questions, translate images, and perform text-based language tasks on different language sets.

1. English

2. French

3. German

4. Spanish

5. Italian

6. Portuguese

7. Japanese

8. Korean

9. Chinese

10. Arabic

11. Greek

12. Persian

13. Polish

14. Indonesia

15. Czech Republic

16. Hebrew

17. Hindi

18. Dutch

19. Romanian

20. Russian

twenty one. Turkish

twenty two. Ukrainians

twenty three. Vietnamese people

In a blog post, Cohere showed how AYA Vision analyzes images and text from product packaging, providing translations and explanations. It also allows you to identify and describe art styles from different cultures, helping users learn about objects and traditions through an AI-powered visual understanding.

Aya Vision’s capabilities have a wide range of meaning across multiple disciplines.

Language learning and education: Users can translate and explain images in multiple languages, making educational content more accessible.

Cultural conservation: This model generates detailed descriptions of art, landmarks and historical artifacts, and can support cultural documentation in underrated languages.

Accessibility Tools: Vision-based AI can help visually impaired users by providing detailed image descriptions in their native language.

Global Communication: Real-time multimodal translation allows organizations and individuals to communicate more effectively across languages.

Strong performance and high efficiency across major benchmarks

One of the outstanding features of AYA Vision is its efficiency and performance compared to model size. Despite being significantly smaller than some major multimodal models, the AYA Vision outperforms much larger alternatives in some key benchmarks.

•The AYA Vision 8B outperforms the Llama 90B 11 times larger.

•The AYA Vision 32B outperforms the Qwen 72b, Llama 90b and Molmo 72b. All of these are at least twice (or more).

• Benchmark results for ayavisionbench and m-wildvision show that AYA Vision 32B achieves a win rate of up to 79%, and AYA Vision 32B reaches a win rate of 72% on the multilingual image comprehension task.

A visual comparison of efficiency and performance highlights the benefits of AYA vision. As shown in the efficiency-performance trade-off graph, the AYA Vision 8B and 32B exhibit best-in-class performance compared to parameter sizes, outperforming much larger models while maintaining computational efficiency.

Technology innovation that drives AYA vision

AI Attributes AYA Vision performance improvements belong to several major innovations.

Synthetic Annotations: This model leverages synthetic data generation to enhance training for multimodal tasks.

Multilingual Data Scaling: By translating and rephrasing data between languages, the model provides a broader understanding of multilingual contexts.

Merger of multimodal models: Advanced techniques combine insights from both vision and language models to improve overall performance.

These advancements allow AYA Vision to process images and text more accurately while maintaining powerful multilingual capabilities.

The Gradual Performance Improvement Chart shows that progressive innovations such as synthetic fine tuning (SFT), model mergers, and scaling contributed to AYA Vision’s high victory.

Impact on enterprise decision makers

Despite the ostensibly corporate-friendly nature of AYA Vision, companies may struggle to use it much given their restrictive non-commercial licensing terms.

Nevertheless, CEOs, CTOs, IT leaders and AI researchers can use the models to explore AI-driven multilingual and multimodal capabilities within their organization, particularly in research, prototyping and benchmarking.

Companies can use it for internal research and development, evaluate multilingual AI performance, and experiment with multimodal applications.

The CTOS and AI teams will think AYA Vision is valuable as a highly efficient and open weight model that outweighs much larger alternatives, while requiring less computational resources.

This makes it a useful tool for benchmarking your own models, investigating potential AI-driven solutions, and testing multilingual multimodal interactions before committing to commercial deployment strategies.

For data scientists and AI researchers, AYA vision is much more useful.

Its open source nature and rigorous benchmarks provide a transparent foundation for studying model behavior, fine-tuning it in a non-commercial setting, and contributing to the advancement of open AI.

Whether used for internal research, academic collaboration, or AI ethics assessment, AYA Vision serves as a cutting-edge resource for businesses seeking to remain at the forefront of multilingual and multimodal AI without the constraints of their own closed model.

Open Source Research and Collaboration

AYA Vision is part of AYA and is a broader initiative focused on making AI and related technologies more multilingual.

Since launching in February 2024, the AYA initiative has joined a global research community with over 3,000 independent researchers in 119 countries and has collaborated to improve language AI models.

To promote its commitment to open science, Cohere will release both AYA Vision 8B and 32B open weights on Kaggle and Hugging Face, allowing researchers around the world to access and experiment with the models. Additionally, Cohere for AI has introduced AyavisionBenchmark, a new set of multilingual vision assessments designed to provide a rigorous evaluation framework for multimodal AI.

The availability of AYA Vision as an open weight model illustrates an important step to making multilingual AI research more comprehensive and accessible.

AYA Vision is based on the success of Aya Expanse, another LLM family of AI in Cohere, focusing on multilingual AI. By focusing on multimodal AI, AI’s Cohere positions AYA Vision as an important tool for researchers, developers, and businesses looking to integrate multilingual AI into their workflows.

As the AYA initiative continues to evolve, AI’s Cohere has also announced plans to launch new collaborative research efforts in the coming weeks. Researchers and developers interested in contributing to the advancement of multilingual AI can join the open science community or apply for research grants.

For now, the release of AYA Vision represents a major leap in multilingual multimodal AI, offering a high-performance, open-weight solution that challenges the advantages of larger closed models. By making these advancements available to the broader research community, Cohere for AI continues to push the boundaries of what is possible with AI-driven multilingual communication.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version