Researchers at Google’s DeepMind have unveiled a new method to accelerate AI training, significantly reducing the computational resources and time required. This new approach could make AI development both faster and more cost-effective, which is beneficial for the environment.
“Our approach—multimodal contrastive learning with joint example selection (JEST)—surpasses state-of-the-art models with up to 13 times fewer iterations and 10 times less computation,” the study said.
The AI industry is known for its high energy consumption. Large-scale AI systems like ChatGPT require substantial processing power, which in turn demands a lot of energy and water for cooling these systems. Microsoft’s water consumption, for example, reportedly spiked by 34% from 2021 to 2022 due to increased AI computing demands, with ChatGPT consuming nearly half a liter of water for every 5 to 50 prompts.
The International Energy Agency (IEA) projects that data center electricity consumption will double from 2022 to 2026, drawing comparisons between the power demands of AI and the oft-criticized energy profile of the cryptocurrency mining industry.
However, approaches like JEST could offer a solution. By optimizing data selection for AI training, JEST can significantly reduce the number of iterations and computational power needed, which could lower overall energy consumption. This method aligns with efforts to improve the efficiency of AI technologies and mitigate their environmental impact.
If the technique proves effective at scale, AI trainers will require only a fraction of the power used to train their models. This means they could create more powerful AI tools with the same resources currently used or consume fewer resources to develop newer models.
How JEST works
JEST operates by selecting complementary batches of data to maximize the AI model’s learnability. Unlike traditional methods that often select individual examples without considering their collective impact, this algorithm strategically composes batches to enhance the overall effectiveness of the training process.
For example, imagine learning multiple languages. Instead of tackling English, German, and Norwegian separately and in sequence based on difficulty, it might be more effective to study them together in a manner where mastering one language enhances your understanding and proficiency in the others.
Google adopted a similar approach, and it proved to be successful.
“We demonstrate that jointly selecting batches of data is more effective for learning than selecting examples independently,” stated the researchers in their paper.
To achieve this, Google utilized “multimodal contrastive learning” within the JEST process to identify dependencies between data points. This approach significantly enhances the speed and efficiency of AI training, all while reducing the necessary computing power.
Critical to this approach was the use of pre-trained reference models to guide the data selection process, according to Google. This technique enabled the model to concentrate on high-quality, meticulously curated datasets, thereby further refining the efficiency of the training process.
“The quality of a batch depends not only on the individual quality of its data points but also on how these points interact within the batch,” explained the paper.
Experiments conducted in the study demonstrated substantial performance gains across various benchmarks. For example, when applied to the WebLI dataset, JEST significantly enhanced learning speed and resource efficiency.
Furthermore, the algorithm swiftly identified highly learnable sub-batches, expediting training by prioritizing data points that synergistically contribute to learning. This approach, known as “data quality bootstrapping,” prioritizes quality over quantity, proving superior for AI training.
“A reference model trained on a small curated dataset can effectively guide the curation of a much larger dataset, enabling the training of a model that significantly outperforms the quality of the reference model on numerous downstream tasks,” stated the paper.
Source: Original Article