At the Next ’25 conference, Google introduced Ironwood, their seventh-generation Tensor Processing Unit (TPU), marking a pivotal leap in AI technology. Designed specifically for inference — a process where AI models make predictions based on learned data — Ironwood is the most powerful, scalable, and energy-efficient TPU Google has ever developed.
Ironwood signifies a shift from reactive AI models, which respond to queries, to proactive systems that generate insights independently. This evolution defines what Google calls the “age of inference,” where AI agents autonomously retrieve and synthesise data to offer comprehensive answers, not just raw information.
This TPU is engineered to tackle the immense computational demands of next-generation AI, including Large Language Models (LLMs) and Mixture of Experts (MoEs), both essential for tasks requiring advanced reasoning. LLMs, like those powering chatbots, process massive amounts of text to generate human-like responses, while MoEs dynamically activate different model parts to optimise performance and efficiency.
Ironwood’s architecture scales up to 9,216 liquid-cooled chips connected through a cutting-edge Inter-Chip Interconnect (ICI) network. This setup ensures rapid data transfer and enhanced synchronization across chips, critical for training and running complex AI models. To put its capabilities into perspective, Ironwood’s 42.5 exaflops of compute power dwarfs the world’s largest supercomputer, El Capitan, which operates at 1.7 exaflops.
Key to Ironwood’s prowess is its advanced SparseCore, a specialised component that accelerates processing for applications like recommendation systems and financial modeling. Additionally, Ironwood integrates Google’s Pathways software, enabling seamless distribution of AI workloads across thousands of TPUs, thus pushing the boundaries of generative AI.
Efficiency is another hallmark of Ironwood. It delivers double the performance per watt compared to its predecessor, Trillium, thanks to optimized chip design and innovative liquid cooling technology. This is crucial as AI’s growing energy demands pose challenges for sustainable computing.
Memory capabilities have also seen dramatic improvements. Ironwood boasts 192 GB of High Bandwidth Memory (HBM) per chip — six times that of Trillium — allowing it to process larger datasets with reduced latency. Its HBM bandwidth reaches an impressive 7.2 terabits per second, ensuring swift data access essential for modern AI workloads.
The enhanced ICI network, with 1.2 terabits per second bidirectional bandwidth, facilitates efficient communication between chips. This is vital for distributed AI tasks, where different parts of a model run simultaneously across multiple processors to speed up computation.
Beyond AI-specific applications, Ironwood’s expanded capabilities support scientific and financial domains, broadening its impact. With models like Gemini 2.5 and the Nobel Prize-winning AlphaFold already operating on TPUs, Ironwood’s launch anticipates new breakthroughs in AI research and applications.
Set to be available later this year, Ironwood sets a new benchmark for performance, efficiency, and scalability in the rapidly evolving landscape of artificial intelligence.
(The writer was invited to the Google Next ’25 event at Las Vegas, Nevada)
Published - April 09, 2025 05:31 pm IST
The redemption story of Motorola has really been interesting. The company has successfully found a s...
Oppo has announced the launch of its new smartphone K13 5G on April 21 in India which will target th...