NVIDIA Introduces 7nm Ampere A100 Tensor Core GPU
NVIDIA A100, the first GPU based on the NVIDIA Ampere architecture, providing the greatest generational performance leap of NVIDIA’s eight generations of GPUs, is also built for data analytics, scientific computing and cloud graphics, and is in full production and shipping to customers worldwide, Huang announced.
Eighteen of the world’s leading service providers and systems builders are incorporating them, among them Alibaba Cloud, Amazon Web Services, Baidu Cloud, Cisco, Dell Technologies, Google Cloud, Hewlett Packard Enterprise, Microsoft Azure and Oracle.
The A100, and the NVIDIA Ampere architecture it’s built on, boost performance by up to 20x over its predecessors, Huang said. He detailed five key features of A100, including:
- More than 54 billion transistors, making it the world’s largest 7-nanometer processor.
- Third-generation Tensor Cores with TF32, a new math format that accelerates single-precision AI training out of the box. NVIDIA’s widely used Tensor Cores are now more flexible, faster and easier to use, Huang explained.
- Structural sparsity acceleration, a new efficiency technique harnessing the inherently sparse nature of AI math for higher performance.
- Multi-instance GPU, or MIG, allowing a single A100 to be partitioned into as many as seven independent GPUs, each with its own resources.
- Third-generation NVLink technology, doubling high-speed connectivity between GPUs, allowing A100 servers to act as one giant GPU.
The result of all this: 6x higher performance than NVIDIA’s previous generation Volta architecture for training and 7x higher performance for inference.
|Ampere A100||Tesla V100||Tesla P100|
|Boost Clock||1410 MHz||1530 MHz||1480 MHz|
|Memory||40GB HBM2e||16GB HBM2||16GB HBM2|
|Memory Bandwidth||1.6 TB/s||900 GB/s||616 GB/s|
|Die Size||826 mm2||815 mm2||610 mm2|
|Process Tech||7 nm||12 nm||16 nm|
The GA100 GPU is manufactured on TSMC’s 7nm process, with a die size of 826 mm 2. This is larger than the 815 mm 2 from GV100, and contains more than double the transistor count of the previous generation’s GPU. The number is an incredible 54 billion (!) transistors, up from 21.1 billion with GV100.
The GA100’s Stream Multiprocessor (SM) count is 108, and Ampere’s SMs are organized as 64 PF32 and 32 FP64 each. This breaks down to 6912 FP32 (single precision) CUDA Cores, and 3456 FP64 (double precision) CUDA Cores. And while the 432 Tensor Core count is down from GV100’s 640, A100 is using third-generation Tensor Core technology.
The third-generation Tensor Cores in the NVIDIA Ampere architecture are beefier than prior versions. They support a larger matrix size — 8x8x4, compared to 4x4x4 for Volta — that lets users tackle tougher problems. That’s one reason why an A100 with a total of 432 Tensor Cores delivers up to 19.5 FP64 TFLOPS, more than double the performance of a Volta V100.
Memory size and bandwidth has also increased significantly, with 40GB of HBM2e on a 5120-bit bus providing nearly 1.6 TB/s in bandwidth (1,555 MB/s).
The DGX A100 platform shown during the KitchenNote is “the world’s largest GPU”; a massive 8-GPU configuration that weighs 50 lbs (NVIDIA’s CEO pulled out of the oven just a couple of days ago). The price tag for the the DGX? $200,000.
Is Ampere for Consumer GPUs Coming Soon?
Ampere will eventually replace Nvidia’s Turing and Volta chips with a single platform that streamlines Nvidia’s GPU lineup, Huang said in a pre-briefing with media members Wednesday. While consumers largely know Nvidia for its videogame hardware, the first launches with Ampere are aimed at AI needs in the cloud and for research.
“Unquestionably, it’s the first time that we’ve unified the acceleration workload of the entire data center into one single platform,” Huang said.
It seems that we will have to wait a bit longer for more GeForce-related Ampere news.