AMD says its MI300X AI accelerator is faster than Nvidia's H100

Alfonso Maruccia

Posts: 1,025   +302
Staff
A hot potato: AMD is fighting back at Nvidia's claims about the H100 GPU accelerator, which according to Team Green is faster than the competition. But Team Red said Nvidia didn't tell the whole story, and provided further benchmark results with industry-standard inferencing workloads.

AMD has finally launched its Instinct MI300X accelerators, a new generation of server GPUs designed to provide compelling performance levels for generative AI workloads and other high-performance computing (HPC) applications. MI300X is faster than H100, AMD said earlier this month, but Nvidia tried to refute the competitor's statements with new benchmarks released a couple of days ago.

Nvidia tested its H100 accelerators with TensorRT-LLM, an open-source library and SDK designed to efficiently accelerate generative AI algorithms. According to the GPU company, TensorRT-LLM was able to run 2x faster on H100 than on AMD's MI300X with proper optimizations.

AMD is now providing its own version of the story, refuting Nvidia's statements about H100 superiority. Nvidia used TensorRT-LLM on H100, instead of vLLM used in AMD benchmarks, while comparing performance of FP16 datatype on AMD Instinct MI300X to FP8 datatype on H100. Furthermore, Team Green inverted AMD's published performance data from relative latency numbers to absolute throughput.

AMD suggests that Nvidia tried to rig the game, while it is still busy identifying new paths to unlock performance and raw power on Instinct MI300 accelerators. The company provided the latest performance levels achieved by the Llama 70B chatbot model on MI300X, showing an even higher edge over Nvidia's H100.

By using the vLLM language model for both accelerators, MI300X was able to achieve 2.1x the performance of H100 thanks to the latest optimizations in AMD's software stack (ROCm). The company highlighted a 1.4x performance advantage over H100 (with equivalent datatype and library setup) earlier in December. vLLM was chosen because of its broad adoption within the community and the ability to run on both GPU architectures.

Even when using TensorRT-LLM for H100, and vLLM for MI300X, AMD was still able to provide a 1.3x improvement in latency. When using lower-precision FP8 and TensorRT-LLM for H100, and higher-precision FP16 with vLLM for MI300X, AMD's accelerator was seemingly able to demonstrate a performance advantage in absolute latency.

vLLM doesn't support FP8, AMD explained, and FP16 datatype was chosen for its popularity. AMD said that its results show how MI300X using FP16 is comparable to H100 even when using its best performance settings with FP8 datatype and TensorRT-LLM.

Permalink to story.

 
There are just too many variables at play. Everybody is picking something that works best for them.
That was my thought, different workloads world better on different architectures. AMD is only the second big player in the AI GPU space and there will be many more to follow. nVidias monopoly in AI workloads will soon be at an end. I'm sure they'll still be a dominant player but they will be far from the only player
 
Well, it doesn`t matter, the industry will buy all they can anyway since there is no other competitors in the field beside Nvidia and AMD when it comes for top of the line AI Accelerators.
 
Well, it doesn`t matter, the industry will buy all they can anyway since there is no other competitors in the field beside Nvidia and AMD when it comes for top of the line AI Accelerators.

Intel was claiming they will have the bees knees soon - Think Google working on something ( they already have some experience in designing stuff and have like the other big guys bought out emerging players )
 
Intel was claiming they will have the bees knees soon - Think Google working on something ( they already have some experience in designing stuff and have like the other big guys bought out emerging players )
google uses their own tensor processing unit.
basically, it's easier to design compute gpu than gaming gpu, which is why:
a. google has its own tpu and microsoft is making their own ai gpu
b. intel arc has higher fp32 compute spec than rx 6800 but way below in gaming performance
c. amd is still behind nvidia in gaming driver optimization
 
Back