Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more
Liquid AIannounced today, a startup co-founded by former researchers at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT). Debut of first multimodal AI model.
Unlike most others in the current generative AI wave, these models are not based on the Transformer architecture outlined in this book. A seminal 2017 paper: “All you need is attention.”
Instead, Liquid says its goal is to “explore ways to build foundational models beyond the Generative Pre-Trained Transformer (GPT)” and uses the new LFM to specifically “first-principle …building from the same way engineers build engines, cars, and airplanes.” ”
They did just that, as the new LFM model already boasts better performance than other transformer-based models of comparable size, such as Meta’s Llama 3.1-8B and Microsoft’s Phi-3.5 3.8B. It seems so.
These models, known as “Liquid Foundation Models” (LFM), currently come in three different sizes and variations.
- LFM 1.3B (min)
- LFM 3B
- LFM 40B MoE (largest, “expert mix” model similar to Mistral’s Mixtral)
The “B” in the name stands for billion and refers to the number of parameters (or settings) that control the model’s information processing, analysis, and output generation. In general, models with a larger number of parameters perform better across a wider range of tasks.
Liquid AI has already shown that the LFM 1.3B version has been tested against Meta’s new Llama 3.2 in many major third-party benchmarks, including the popular Massive Multitask Language Understanding (MMLU), which consists of 57 problems spanning science, technology, and engineering. -1.2B and Microsoft’s Phi-1.5. and Mathematics (STEM) fields, “For the first time, a non-GPT architecture significantly outperforms transformer-based models.”
All three are designed to provide cutting-edge performance while optimizing memory efficiency, with Meta’s Llama-3.2-3B model requiring over 48 GB of memory, while Liquid’s LFM -3B requires only 16 GB (see diagram). graph above).
Maxime Labonne, Head of Liquid AI Post Training: I took his account on X LFM is “the proudest release of my career :)” and says LFM’s core advantage is its ability to outperform transformer-based models while significantly reducing memory usage. I made that clear.
These models are designed to be competitive not only in raw performance benchmarks but also in terms of operational efficiency, ranging from enterprise-level applications, especially in the financial services, biotechnology, and consumer electronics sectors. , perfect for a variety of use cases. Up to introduction to edge devices.
However, what is important for future users and customers is that the model is not open source. Instead, users should access it via: Liquid mystery playground, lambda chator Perplexity AI.
How Liquid goes beyond generative pre-trained transformers (GPT)
In this case, Liquid uses a combination of “computational units deeply rooted in the theory of dynamical systems, signal processing, and numerical linear algebra,” and says the result is “a general-purpose AI model that can be used to model anything.” states. To train a new LFM, use a set of data such as video, audio, text, time series, and signals.
Last year, VentureBeat took a closer look at Liquid’s approach to training Transformer AI models, noting its use of CSAIL’s architecture developer Liquid Neural Networks (LNN). This network aims to create artificial “neurons” or nodes for transformation. Make your data more efficient and adaptive.
Unlike traditional deep learning models that require thousands of neurons to perform complex tasks, LNNs, combined with innovative mathematical formulations, demonstrate that the same results can be achieved with fewer neurons. I have proven it.
Liquid AI’s new models maintain this core advantage of adaptability, allowing for real-time adjustments during inference without the computational overhead associated with traditional models, maximizing memory usage while minimizing memory usage. It can efficiently process 1 million tokens.
According to a graph on the Liquid blog, for example, the LFM-3B model outperforms popular models such as Google’s Gemma-2, Microsoft’s Phi-3, and Meta’s Llama-3.2 in terms of inference memory footprint, especially on the token length scale. Better than.
Unlike other models, where memory usage increases rapidly when processing long contexts, LFM-3B maintains a significantly smaller footprint, making it ideal for applications that require large amounts of sequential data processing, such as document analysis or chatbots. Very suitable for your application.
Liquid AI has built a foundational model that is versatile across multiple data modalities such as audio, video, and text.
With this multimodal capability, Liquid aims to address a wide range of industry-specific challenges, from financial services to biotechnology to consumer electronics.
Accept invitations to launch events and keep an eye on future improvements
Liquid AI says it is optimizing its models for deployment on hardware from NVIDIA, AMD, Apple, Qualcomm, and Cerebras.
Although the model is still in preview, Liquid AI invites early adopters and developers to test the model and provide feedback.
Labonne said that while things are “not perfect,” the feedback received at this stage is helping the team prepare for the full launch event on October 23, 2024, at MIT’s Kresge Auditorium in Cambridge, Massachusetts. said it helps improve the product. the company accepts If you would like to RSVP for that event, please reply directly here.
As part of its commitment to transparency and scientific progress, Liquid said it will publish a series of technical blog posts in advance of its product launch event.
The company will also work on a red team initiative that encourages users to test the limits of their models to improve future iterations.
With the introduction of Liquid Foundation Models, Liquid AI positions itself as a major player in the foundation model space. By combining cutting-edge performance with unprecedented memory efficiency, LFM provides an attractive alternative to traditional transformer-based models.