In brief: Intel has drummed up a rivalry between its new Gaudi2 accelerator and the now two-year-old market leader, the Nvidia A100. In two benchmarks suited to its niche, the new gaudily-named accelerator pulls out ahead.
Gaudi2 is made for Intel by Habana Labs, an Israeli company that it acquired at the end of 2019 for $2 billion. Habana actually makes two types of specialized accelerators: some for training neural networks, like Gaudi2; and others for running (i.e., "inferencing") them, such as Goya and Greco.
Habana and Intel launched Gaudi2 in May but waited until last week to upload its benchmark scores into the public MLPerf database. In their graphs, they compare the scores of their Gaudi2 system against the public scores of A100-equipped systems from Nvidia and Dell.
ResNet-50 tests hardware's ability to train an AI to classify images. Habana's Gaudi2 system took just 18 minutes to train the AI well enough for it to pass the test, easily surpassing Nvidia's A100 system, which needed almost half an hour.
Habana's Gaudi2 system took just 17 minutes to train the BERT model, beating Nvidia's A100 system's time by about a minute. BERT is a natural language processing model, and in this test, it trains itself with Wikipedia articles.
For both benchmarks, all the systems used eight accelerators/GPUs. Habana's system paired theirs with a pair of 40-core Intel Xeon 8380 CPUs and Nvidia's used two 64-core AMD Epyc 7742 CPUs.
Gaudi2 features 24 TPCs (tensor processor cores) and two MMEs (matrix multiplication engines) that run partially in parallel. It supports a broad array of data types, including FP32, TF32, BF16, FP16, and FP8. It also has a dedicated media engine for processing audio and visual media as inputs.
For memory, Gaudi2 has six 16 GB stacks of HBM2e that sum to 96 GB and 2.45 TB/s of total memory bandwidth. Inside, it has a 48 MB cache. For connectivity, it uses an x16 PCIe 4.0 connection and has 24x 100 Mbps RoCE2 (RDMA over Converged Ethernet 2) ports.
Habana has clearly created a real A100-competitor for Intel. Its timing could be better, given that Nvidia announced the H100 three months ago, but the two are such different products that even though they might compete in benchmarks, they might not really be competing for motherboard slots.
Whereas the A100 and H100 are versatile behemoths, Gaudi2 is a streamlined accelerator trying to do something different, and it'll be fascinating to see whether it's successful or not.