Join our daily and weekly newsletter for the latest updates and exclusive content on industry-leading AI coverage. learn more
Two common approaches to customizing large-scale language models (LLMS) for downstream tasks are fine-tuning and in-context learning (ICL). in Recent researchGoogle Deepmind and Stanford University researchers have investigated the generalizability of these two methods. They found that ICLs have greater generalization capabilities (but higher computational costs occur during inference). They also propose new approaches to make the most of both worlds.
Findings can help developers make important decisions when building custom-built enterprise data LLM applications.
Test how language models learn new tricks
Fine adjustments It involves collecting pre-trained LLMs and further training on small, specialized data sets. This will adjust the internal parameters of the model to teach new knowledge and skills. Context learning (ICL) on the other hand does not change the underlying parameters of the model. Instead, we guide the LLM by directly providing examples of the desired task within the input prompt. This model uses these examples to understand how to handle new similar queries.
Using these two methods, the researchers aimed to rigorously compare the extent to which the model generalizes to new tasks. They built a “controlled synthetic dataset of fact knowledge” with complex, self-aligned structures, including imaginary family trees and hierarchies of fictional concepts.
To test the model’s ability to learn new information, all nouns, adjectives, and verbs were replaced with nonsense terms to avoid overlap with data LLM encountered before training.
The models were then tested on a variety of generalization tasks. For example, one test is involved A simple reversal. If the model “FEMP is more dangerous than Glon” is trained, can you correctly guess that “Glon is less dangerous than FEMP”? Another test focused on Simple syllogisma logical form of deduction. If it is said, “All Gron is Yomp” and “All Trofs are Gron”, can the model assume that “All Trofs are Yom”? We also tested a more nuanced understanding using a more complex “semantic structure benchmark” with a richer hierarchy of these constructed facts.
“Our results focus primarily on setting the way models generalize from tweaking new knowledge structures to deductions and reversals. It has clear implications for situations where tweaks have been fine-tuned to adapt to unique information specific to the company, said Andrew Lampinen, research scientist at Google Deepmind and the paper’s lead author.
To assess performance, researchers fine-tune the Gemini 1.5 flash on these datasets. In the case of ICL, they supplied the training dataset (or large subset) as the context of the instruction tuning model before raising the test question.
The results consistently showed that ICL led to better generalization than standard fine-tuning in data matching settings. Models using ICLs were generally superior in tasks such as reversing relationships and making logical deductions from the provided context. Without fine-tuning or ICL, the pre-trained model performed poorly and demonstrated novelty in the test data.
“One of the main trade-offs to consider is that ICLs do not require tweaking (saving training costs), but they require that additional context be provided to the model, which is why they are generally more computationally expensive when used,” says Lampinen. “ICLs, on the other hand, tend to improve the generalization of the datasets and models they evaluate.”
Hybrid approach: enhanced fine tuning
Based on the observation that ICL is superior in flexible generalization, researchers proposed a new way to enhance fine tuning. The core idea is to use LLM’s proprietary ICL functionality to generate more diverse and richer guess examples, and add these extensions to the dataset that you use for fine tuning.
They investigated two major data augmentation strategies.
- a Local Strategy: This approach focuses on individual information. LLM is asked to paraphrase single statements from training data or to derive direct inferences from them, such as generating inversions.
- a Global Strategy: LLM is given a complete training dataset as context and is asked to generate inferences by linking a particular document or fact with the rest of the provided information, resulting in lengthening inferences for related inferences.
When the models were fine-tuned with these extended datasets, profits were important. This greatly improves generalization with extended fine-tuning, and not only standard fine-tuning, but also simple ICLs.
“For example, if one of the company’s documents states that “XYZ is an internal tool for analyzing data,” our results suggest that ICL and extended Finetuning are more effective in answering related questions such as “where there are internal tools for analysis of data,” Lampinen said.
This approach offers an attractive path for businesses. By investing in creating these ICL-assisted datasets, developers can build tweaking models that demonstrate stronger generalization capabilities.
This can lead to more robust and reliable LLM applications that perform better with diverse, real-world inputs without incurring the continuous inference time costs associated with large-scale context imprompts.
“Fine-tuned fine-tuning will generally require additional ICL steps to scale the data and then require tweaking, making the fine-tuning process more expensive,” says Lampinen. “Whether or not its additional costs are valuable by the improved generalization depends on the particular use case. However, amortization for many uses of a model is computationally cheaper than applying an ICL every time the model is used.”
Lampinen noted that further research is needed to see how the components studied interact in different settings, but added that their findings would like to consider considering enhanced tweaks when developers see inadequate performance with tweaks alone.
“We hope that, ultimately, this work contributes to the science of understanding learning and generalization in basic models and the practicality of adapting them to downstream tasks,” Lampinen said.