During the opening keynote to NVIDIA’s GPU Technology Conference, CEO Jen-Hsun Huang formally unveiled the latest GPU architecture and the first product based on it. The Tesla V100 accelerator is based on the Volta GPU architecture and features some amazingly impressive specifications. Let’s take a look.
Tesla V100 | GTX 1080 Ti | Titan X (Pascal) | GTX 1080 | GTX 980 Ti | TITAN X | GTX 980 | R9 Fury X | R9 Fury | |
---|---|---|---|---|---|---|---|---|---|
GPU | GV100 | GP102 | GP102 | GP104 | GM200 | GM200 | GM204 | Fiji XT | Fiji Pro |
GPU Cores | 5120 | 3584 | 3584 | 2560 | 2816 | 3072 | 2048 | 4096 | 3584 |
Base Clock | – | 1480 MHz | 1417 MHz | 1607 MHz | 1000 MHz | 1000 MHz | 1126 MHz | 1050 MHz | 1000 MHz |
Boost Clock | 1455 MHz | 1582 MHz | 1480 MHz | 1733 MHz | 1076 MHz | 1089 MHz | 1216 MHz | – | – |
Texture Units | 320 | 224 | 224 | 160 | 176 | 192 | 128 | 256 | 224 |
ROP Units | 128 (?) | 88 | 96 | 64 | 96 | 96 | 64 | 64 | 64 |
Memory | 16GB | 11GB | 12GB | 8GB | 6GB | 12GB | 4GB | 4GB | 4GB |
Memory Clock | 878 MHz (?) | 11000 MHz | 10000 MHz | 10000 MHz | 7000 MHz | 7000 MHz | 7000 MHz | 500 MHz | 500 MHz |
Memory Interface | 4096-bit (HBM2) | 352-bit | 384-bit G5X | 256-bit G5X | 384-bit | 384-bit | 256-bit | 4096-bit (HBM) | 4096-bit (HBM) |
Memory Bandwidth | 900 GB/s | 484 GB/s | 480 GB/s | 320 GB/s | 336 GB/s | 336 GB/s | 224 GB/s | 512 GB/s | 512 GB/s |
TDP | 300 watts | 250 watts | 250 watts | 180 watts | 250 watts | 250 watts | 165 watts | 275 watts | 275 watts |
Peak Compute | 15 TFLOPS | 10.6 TFLOPS | 10.1 TFLOPS | 8.2 TFLOPS | 5.63 TFLOPS | 6.14 TFLOPS | 4.61 TFLOPS | 8.60 TFLOPS | 7.20 TFLOPS |
Transistor Count | 21.1B | 12.0B | 12.0B | 7.2B | 8.0B | 8.0B | 5.2B | 8.9B | 8.9B |
Process Tech | 12nm | 16nm | 16nm | 16nm | 28nm | 28nm | 28nm | 28nm | 28nm |
MSRP (current) | lol | $699 | $1,200 | $599 | $649 | $999 | $499 | $649 | $549 |
While we are low on details today, it appears that the fundamental compute units of Volta are similar to that of Pascal. The GV100 has 80 SMs with 40 TPCs and 5120 total CUDA cores, a 42% increase over the GP100 GPU used on the Tesla P100 and 42% more than the GP102 GPU used on the GeForce GTX 1080 Ti. The structure of the GPU remains the same GP100 with the CUDA cores organized as 64 single precision (FP32) per SM and 32 double precision (FP64) per SM.
Click to Enlarge
Interestingly, NVIDIA has already told us the clock speed of this new product as well, coming in at 1455 MHz Boost, more than 100 MHz lower than the GeForce GTX 1080 Ti and 25 MHz lower than the Tesla P100.
Click to Enlarge
Volta adds in support for a brand new compute unit though, known as Tensor Cores. With 640 of these on the GPU die, NVIDIA directly targets the neural network and deep learning fields. If this is your first time hearing about Tensor, you should read up on its influence on the hardware markets, bringing forth an open-source software library for machine learning. Google has invested in a Tensor-specific processor already, and now NVIDIA throws its hat in the ring.
Adding Tensor Cores to Volta allows the GPU to do mass processing for deep learning, on the order of a 12x improvement over Pascal’s capabilities using CUDA cores only.
For users interested in standard usage models, including gaming, the GV100 GPU offers 1.5x improvement in FP32 computing, up to 15 TFLOPS of theoretical performance and 7.5 TFLOPS of FP64. Other relevant specifications include 320 texture units, a 4096-bit HBM2 memory interface and 16GB of memory on-module. NVIDIA claims a memory bandwidth of 900 GB/s which works out to 878 MHz per stack.
Maybe more impressive is the transistor count: 21.1 BILLION! NVIDIA claims that this is the largest chip you can make physically with today’s technology. Considering it is being built on TSMC's 12nm FinFET technology and has an 815 mm2 die size, I see no reason to doubt them.
Shipping is scheduled for Q3 for Tesla V100 – at least that is when NVIDIA is promising the DXG-1 system using the chip is promised to developers.
I know many of you are interested in the gaming implications and timelines – sorry, I don’t have an answer for you yet. I will say that the bump from 10.6 TFLOPS to 15 TFLOPS is an impressive boost! But if the server variant of Volta isn’t due until Q3 of this year, I find it hard to think NVIDIA would bring the consumer version out faster than that. And whether or not NVIDIA offers gamers the chip with non-HBM2 memory is still a question mark for me and could directly impact performance and timing.
More soon!!
I would expect this be priced
I would expect this be priced at ~$3000+ and wont enter the consumer market for at least 6+month after availability in a reduced form.
nvidia only got itself to compete with at those levels, so not point in undercutting the GTX 1080 ti.
Volta only need to come in the consumer product when FP16 is fully leveraged. Might be over a year ?
This card isn’t going to be
This card isn’t going to be anywhere near $3,000. Probably close to $8,000 MSRP. The current P100 is $7,000 MSRP.
JHC, nVidia are just smacking
JHC, nVidia are just smacking AMD around like a…
So now we know that AMD Vega
So now we know that AMD Vega will sit between Pascal and Volta, with 12,5 tflops FP32 but lets see about the rapid packed math… if it’s as efficient as Volta’s, so that it can reach close to 100 Tflops.
However, I do believe that price wise Vega will be much cheaper! A die size of 815mm^2 is just insane and massively expensive…
Assuming that the GTX2080 is going to be a cut down of this beast, it will probably come out around 12,5/13,0 tflops, just to hedge AMD’s Vega!
So Vega doesn’t look that bad if the pricing is correct.
Titan Xp already rated at
Titan Xp already rated at 12Tflops at it’s stock configuration. if you can push the clock to 2Ghz (which is pretty much all pascal chip capable of) the performance will peak at 15Tflops! this new chip main focus is FP64 and deep learning stuff just like GP100. they are not gaming chip like GP102/104/106/107/108 are.
As I said the 2080 will be a
As I said the 2080 will be a cut down of GV100 and if nvidia delivers the expected bang, I mean a (20)80 model with the performance of the previous gen Xp or Ti version, then we are looking at 12,5tflops of raw performance for the 2080.
I also doubt that the Volta architecture will clock as high as Pascal.This is a more wider GPU so it will be difficult to clock as high, it’s 3840 cuda cores Vs 5120. A cut down version of GV100 will probably come with higher clocks but also less SM’s.
20% less SM’s than the GV100 gives something like 64 SM’s times 64 cuda cores = 4096. Isn’t this the same as Vega 10 ?
Die size something around 600mm^2 ?? Still massive!
The GPU world is on fire!!! 🙂
“So now we know that AMD Vega
“So now we know that AMD Vega will sit between Pascal and Volta”
Based on information that AMD has actually released (rathern than Wild Speculation), we know that Vega will be called Vega.
on track for
on track for Summit!
https://www.olcf.ornl.gov/summit/#timeline
2018 for GPU’s….
Color me
2018 for GPU’s….
Color me disappointed.
Nvidia waiting for AMD to make them produce just like Intel.
That’s a big chip. 3 billions
That’s a big chip. 3 billions in development. So Vega will be 30% faster than GV100? Just kidding.
People should look at that chip, that 3 billion dollars number, consider the fact that TSMC even created a variance of 12nm FinFET just for Nvidia, and then realize that demanding from AMD to offer something much faster at a much lower price is just stupid.
Anyway, Nvidia is going full steam ahead for the AI and deep learning market. If they succeed there, they will have enough money to challenge Intel in the future. They don’t worry about the GPU market of course. AMD had started creating Vega with GTX 1080 in mind, not 1080 Ti or 2080 in a few months from now.
PS The FULL GV100 comes with 5376 CUDA cores. V100 uses a cut down version.
“lol”
Best table filler value
“lol”
Best table filler value ever.
Hopefully they will figure
Hopefully they will figure out how to stop price gouging, and sell it for $300.
You literally have no idea
You literally have no idea what this type of hardware is used for do you?