Sakana AI’s CycleQD outperforms traditional fine-tuning methods for multi-skill language models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more

researchers Fish AI has developed a resource-efficient framework that allows you to create hundreds of language models specialized for different tasks. called cycle QDthis technique uses evolutionary algorithms to combine the skills of different models without the need for expensive and time-consuming training processes.

CycleQD can create swarms of task-specific agents, providing a more sustainable alternative to current paradigms that increase model size.

Rethinking model training

Large-scale language models (LLMs) have shown remarkable capabilities in a variety of tasks. However, training LLMs to acquire multiple skills remains a challenge. When fine-tuning a model, engineers must balance data from different skills to avoid one skill dominating another. Current approaches often require training increasingly larger models, which increases computational and resource requirements.

“Rather than aiming to develop a single large model that performs well on all tasks, a population-based approach that evolves a swarm of diverse niche models is an effective way to develop AI agents with advanced capabilities. “We believe this has the potential to offer an alternative and more sustainable path to scaling up the use of biochemistry,” the Tosakana researchers wrote in a blog post.

To create the model population, the researchers drew inspiration from quality diversity (QD), an evolutionary computing paradigm focused on discovering a diverse set of solutions from an initial population sample. I got it. QD aims to create samples with different “behavioral characteristics” (BCs) representing different skill areas. It accomplishes this through an evolutionary algorithm (EA) that selects parent samples and uses crossover and mutation operations to create new samples.

Diversity in quality (Source: Sakana AI)

cycle QD

CycleQD incorporates QD into your LLM post-training pipeline to help you learn new and complex skills. CycleQD is useful when you have multiple small models fine-tuned for very specific skills, such as coding or performing database or operating system operations, and you want to create new variants with different combinations of those skills. Helpful.

In the CycleQD framework, each of these skills is considered a behavioral characteristic or quality that the next generation model will be optimized for. In each generation, the algorithm focuses on one specific skill as a quality indicator and uses other skills as BC.

“This puts a full range of skills in the spotlight, allowing the LLM to develop more balanced and competent overall,” the researchers explain.

CycleQD starts with a series of expert LLMs, each specialized in one skill. The algorithm then applies “crossover” and “mutation” operations to add new high-quality models to the population. Crossover combines the characteristics of two parent models to create a new model, while mutation makes random changes to a model to explore new possibilities.

Crossover operations are based on merging models. This is a technique that combines the parameters of two LLMs to create a new model with a combination of skills. This is a cost-effective and quick way to develop balanced models without the need for model fine-tuning.

The mutation operation uses: singular value decomposition (SVD) is a factorization technique that decomposes a matrix into simpler components that are easier to understand and manipulate. CycleQD uses SVD to split the skills of a model into basic components or subskills. By fine-tuning these subskills, the mutation process creates models that explore new capabilities beyond those of the parent model. This prevents the model from falling into predictable patterns and reduces the risk of overfitting.

Evaluating the performance of CycleQD

The researchers applied CycleQD to a set of Llama 3-8B expert models fine-tuned for coding, database operations, and operating system operations. The goal was to see if evolutionary methods could combine the skills of the three models to create a better model.

Results showed that CycleQD outperformed traditional fine-tuning and model combination techniques across the tasks evaluated. In particular, the model fine-tuned by combining all datasets performed only slightly better than the single-skill expert model, even though it was trained on more data. Additionally, traditional training processes are much more time-consuming and expensive. CycleQD was also able to create different models with different performance levels regarding the target task.

“These results clearly show that CycleQD outperforms traditional methods and prove that LLM is effective in training people to excel across multiple skills,” the study said. they wrote.

*Comparison of CycleQD and other fine-tuning methods (Source: Sakena AI)*

Researchers believe CycleQD has the potential to enable lifelong learning for AI systems, allowing them to continuously grow, adapt, and accumulate knowledge over time. This can have direct implications for real-world applications. For example, CycleQD allows you to continuously merge the skills of expert models, rather than training large models from scratch.

Another exciting direction is the development of multi-agent systems in which swarms of specialized agents evolved through CycleQD can cooperate, compete, and learn from each other.

“From scientific discovery to solving real-world problems, swarms of specialized agents have the potential to redefine the limits of AI,” the researchers wrote.

VB Daily

Be sure to know! Get the latest news in your inbox every day

By subscribing, you agree to VentureBeat’s Terms of Use.

Thank you for subscribing. Check out other VB newsletters here.

An error has occurred.

Rethinking model training

cycle QD

Evaluating the performance of CycleQD

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply