NVIDIA Announces Tesla P100 PCI Express Card For Deep Learning And HPC, Shipping Q4

NVIDIA Tesla P100

Remember that scene when Al Pacino playing the part of Scarface famously said, "Say hello to my little friend!"? Well now NVIDIA is bringing out the big gun in PCI Express form in what it considers the "most advanced data center GPU ever built," the Tesla P100, a crazy fast GPU accelerator based on NVIDIA's powerful Pascal architecture built on a 16nm FinFET manufacturing process.

This is the second iteration of the Tesla P100, the first of which NVIDIA unveiled during GTX 2016 a couple of months ago. Though it's built on a fancy 16nm FinFET process, the GP100 GPU inside the Tesla P100 is quite large with a 600mm2 die, roughly the size of previous generation high-end Maxwell GPUs. However, it's more advanced than Maxwell with a whopping 15.3 billion transistors comprising the new compute engine—that's nearly twice that of Maxwell.

NVIDIA Tesla P100 Graph
Source: NVIDIA

Whereas the first version of the Tesla P100 was for NVLink-optimized servers, such as IBM's forthcoming Power servers, this new PCIe model opens up the architecture to a much larger audience with only a small performance penalty. It boasts 4.7 TFLOPS of double-precision performance, compared to 5.3 TFLOPS for the NVLink version. It's single-precision and half-precision performance metrics are rated at 9.3 TFLOPS and 18.7 TFLOPS, respectively, compared to 10.6 TFLOPS and 21.2 TFLOPS.

Like the NVLink model, the PCIe variant sports second generation High Bandwidth Memory (HBM2), either 16GB or 12GB (only 16GB is available on the NVLink model). That gives it 32GB/s of PCIe x16 interconnect bandwidth. Therein lies the real benefit of NVLink, which offers 5X the memory bandwidth of PCIe for 160GB/s, but it comes at the expense of compatibility.

"With over 400 HPC applications accelerated—including 9 out of top 10—as well as all deep learning frameworks, every HPC customer can now deploy accelerators in their data centers," NVIDIA says.

NVIDIA Tesla P100 Applications

The PCIe version uses the same number of CUDA cores (3,584) but is optimized for slightly lower clockspeeds with a TDP of 250W, down from 300W for the NVLink variant. NVIDIA hasn't said what the core clockspeed will be, though it will have a 1,300MHz boost clock, compared to 1,480MHz for its NVLink sibling.

According to NVIDIA, a single Tesla P100-powered server delivers higher peformance than 50 CPU-only server nodes when running the AMBER molecular dynamics code, and is faster than 32 CPU-only nodes when running the VASP material science application.

What can be done with all this power? Deep Learning will be a point of focus. The Tesla P100 will also inevitably be used in heavy data crunching applications, such as those that attempt to predict the weather, drug research, and solving other complex problems.

"Accelerated computing is the only path forward to keep up with researchers' insatiable demand for HPC and AI supercomputing," said Ian Buck, vice president of accelerated computing at NVIDIA. "Deploying CPU-only systems to meet this demand would require large numbers of commodity compute nodes, leading to substantially increased costs without proportional performance gains. Dramatically scaling performance with fewer, more powerful Tesla P100-powered nodes puts more dollars into computing instead of vast infrastructure overhead."

The PCIe-based Tesla P100 will be available in the fourth quarter of this year.