Why it matters: Microsoft had been rumored to be working on custom silicon for its data center needs for years. As it turns out, the rumors were true and this week the company unveiled not one but two Arm-based processors. The new chips will be integrated into Azure server farms starting in early 2024, to be used as the workhorses of AI services like Microsoft Copilot.
This week, Microsoft announced it has built two "homegrown" chips that will handle AI and general computing workloads in the Azure cloud. The announcement was made at the Ignite 2023 conference and confirms previous rumors about the existence of "Project Athena" – a custom-designed Arm-based chip that would reduce Microsoft's reliance on off-the-shelf hardware from vendors like Nvidia, especially in the area of artificial intelligence training and inference.
The first chip is called the Microsoft Azure Maia 100 AI Accelerator and is the direct result of Project Athena. As its lengthy name suggests, the Redmond giant designed the chip specifically for running large language models such as GPT-3.5 Turbo and GPT-4. Built on TSMC's 5nm process and featuring no fewer than 105 billion transistors, the new chip supports various MX data types, including sub-8-bit formats for faster model training and inference times.
For reference, Nvidia's H100 AI Superchip has 80 billion transistors, and AMD's Instinct MI300X has 153 billion transistors. That said, we have yet to see any direct performance comparisons between the Maia 100 AI Accelerator and the existing chips used by most companies building AI services. What we do know is that each Maia 100 compute unit has an aggregate bandwidth of 4.8 Terabits thanks to a custom Ethernet-based network protocol that allows for better scaling and end-to-end performance.
It's also worth noting that Microsoft developed the Maia 100 chip using extensive feedback from OpenAI. The two companies worked together to refine the architecture and test GPT models. For Microsoft, this will help optimize the efficiency of Azure's end-to-end AI architecture, while OpenAI will be able to train new AI models that are better and cheaper than what is available today.
The second chip introduced by Microsoft at Ignite is called the Cobalt 100 CPU. This one is a 64-bit, 128-core Arm-based processor based on the Arm Neoverse Compute Subsystems and brings performance improvements of up to 40 percent for more general Azure computing workloads when compared to current generation hardware found in commercial Arm-based servers. Cobalt 100-based servers will be used to power services like Microsoft Teams and Windows 365, among other things.
Rani Borkar, who is the head of Azure infrastructure systems at Microsoft, says the company's homegrown chip efforts build on top of two decades of experience in co-engineering silicon for Xbox and Surface. The new Cobalt 100 CPU allows the company to control performance and power consumption on a per-core basis and makes it possible to build a more cost-effective cloud hardware stack.
The cost part of the equation is particularly important. In the case of the Maia 100 AI Accelerator, Microsoft had to come up with a new liquid cooling solution and a new rack design that provides more space for power and networking cables. That said, the cost of using the new chip is still significantly lower than using specialized hardware from Nvidia or AMD.
Microsoft seems determined to make a Copilot "for everyone and everything you do," and that is reflected in the release of Copilot for Windows, GitHub, Dynamics 365, Microsoft Security, and Microsoft 365. The company just rebranded Bing Chat to "Microsoft Copilot," so it's clear it wants to bolt ever more advanced AI models into every service it offers moving forward.
AI training and inference get expensive fast, and running an AI service is estimated to be up to ten times more expensive than something like a search engine. Making custom silicon could also alleviate supply issues and help Microsoft get a competitive advantage in a crowded landscape of AI cloud providers. Some like Amazon, Meta, and Google also have their own homegrown silicon efforts for the same reasons, and companies like Ampere that once dreamed of becoming the go-to suppliers of Arm-based data center chips will no doubt be forced to adapt to these developments if they want to survive.
That said, the Redmond company says it will keep using off-the-shelf hardware in the near future, including the recently announced H200 Tensor Core GPU from Nvidia. Scott Guthrie, who is executive vice president of the Microsoft Cloud + AI Group, says this will help diversify the company's supply chain and give customers more infrastructure choices.