NVIDIA Calls BS On AMD's H100 Versus MI300X Performance Claims, Shares Benchmarks

nvidia amd bs hero
NVIDIA is taking a swipe at AMD on its developer blog over what it says are misleading claims about the performance of the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM, compared to AMD's MI300 accelerators. NVIDIA says that “at a recent launch event, AMD talked about the inference performance of the H100 GPU compared to that of its MI300X chip. The results shared did not use optimized software, and the H100, if benchmarked properly, is 2x faster.”

NVIDIA shares that “DGX H100 can process a single inference in 1.7 seconds using a batch size of one—in other words, one inference request at a time. A batch size of one results in the fastest possible response time for serving a model.”

In addition to refuting AMD's claims, NVIDIA brought receipts, too. Sharing a graph that shows what it says is the actual performance of its combined hardware and software stack. The data highlights the performance results of a DGX H100 server using eight H100 GPUs on the Llama 2 70B model.

nvidia amd bs body1
Figure 1. Llama 2 70B server inference performance in queries per second with 2,048 input tokens and 128 output tokens for “Batch 1” and various fixed response time settings. Courtesy of NVIDIA.

Anyone wanting to validate these claims will be able to do so because NVIDIA is sharing the information necessary to reproduce the results. The blog post contains the command lines for the scripts used by NVIDIA to build its model, alongside the benchmarking scripts used to gather the data.

It’s surprising that AMD didn’t make sure that the data it shared was as accurate as necessary, as it was only a matter of time until NVIDIA or LLM enthusiasts fact checked it. If AMD wants to gain ground on NVIDIA in the race for AI market share these types of mistakes need to be eliminated.