A preview of potential Volta gaming hardware
We dive into the gaming performance of the TITAN V
This is a multi-part story for the NVIDIA Titan V:
As a surprise to most of us in the media community, NVIDIA launched a new graphics card to the world, the TITAN V. No longer sporting the GeForce brand, NVIDIA has returned the Titan line of cards to where it began – clearly targeted at the world of developers and general purpose compute. And if that branding switch isn’t enough to drive that home, I’m guessing the $2999 price tag will be.
Today’s article is going to look at the TITAN V from the angle that is likely most interesting to the majority of our readers, that also happens to be the angle that NVIDIA is least interested in us discussing. Though targeted at machine learning and the like, there is little doubt in my mind that some crazy people will want to take on the $3000 price to see what kind of gaming power this card can provide. After all, this marks the first time that a Volta-based GPU from NVIDIA has shipped in a place a consumer can get their hands on it, and the first time it has shipped with display outputs. (That’s kind of important to build a PC around it…)
From a scientific standpoint, we wanted to look at the Titan V for the same reasons we tested the AMD Vega Frontier Edition cards upon their launch: using it to estimate how future consumer-class cards will perform in gaming. And, just as we had to do then, we purchased this Titan V from NVIDIA.com with our own money. (If anyone wants to buy this from me to recoup the costs, please let me know! Ha!)
Titan V | Titan Xp | GTX 1080 Ti | GTX 1080 | GTX 1070 Ti | GTX 1070 | RX Vega 64 Liquid | Vega Frontier Edition | |
---|---|---|---|---|---|---|---|---|
GPU Cores | 5120 | 3840 | 3584 | 2560 | 2432 | 1920 | 4096 | 4096 |
Base Clock | 1200 MHz | 1480 MHz | 1480 MHz | 1607 MHz | 1607 MHz | 1506 MHz | 1406 MHz | 1382 MHz |
Boost Clock | 1455 MHz | 1582 MHz | 1582 MHz | 1733 MHz | 1683 MHz | 1683 MHz | 1677 MHz | 1600 MHz |
Texture Units | 320 | 240 | 224 | 160 | 152 | 120 | 256 | 256 |
ROP Units | 96 | 96 | 88 | 64 | 64 | 64 | 64 | 64 |
Memory | 12GB | 12GB | 11GB | 8GB | 8GB | 8GB | 8GB | 16GB |
Memory Clock | 1700 MHz MHz | 11400 MHz | 11000 MHz | 10000 MHz | 8000 MHz | 8000 MHz | 1890 MHz | 1890 MHz |
Memory Interface | 3072-bit HBM2 |
384-bit G5X | 352-bit G5X | 256-bit G5X | 256-bit | 256-bit | 2048-bit HBM2 | 2048-bit HBM2 |
Memory Bandwidth | 653 GB/s | 547 GB/s | 484 GB/s | 320 GB/s | 256 GB/s | 256 GB/s | 484 GB/s | 484 GB/s |
TDP | 250 watts | 250 watts | 250 watts | 180 watts | 180 watts | 150 watts | 345 watts | 300 watts |
Peak Compute | 12.2 (base) TFLOPS 14.9 (boost) TFLOPS |
12.1 TFLOPS | 11.3 TFLOPS | 8.2 TFLOPS | 7.8 TFLOPS | 5.7 TFLOPS | 13.7 TFLOPS | 13.1 TFLOPS |
MSRP (current) | $2999 | $1299 | $699 | $499 | $399 | $699 | $999 |
The Titan V is based on the GV100 GPU though with some tweaks that lower performance and capability slightly when compared to the Tesla-branded equivalent hardware. Though our add-in card iteration has the full 5120 CUDA cores enabled, the HBM2 memory bus is reduced from 4096-bit to 3072-bit and it has one of the four stacks on the package disabled. This also drops the memory capacity from 16GB to 12GB, and memory bandwidth to 652.8 GB/s.
We have yet to spend the proper time digesting the Volta architecture properly, and that is something we want to fix before the consumer cards may be released next year. In fact, there is a decent chance that Volta as it exists today may NEVER be released as a consumer-facing product. Instead we might see another chip iteration to lower costs, remove some of that double-precision horsepower, and help increase margins.
Finally, due to time constraints, we are “moving ahead in our action” in order to provide as much data as quickly as possible. We will follow up this story that looks at gaming performance with one that looks at GPU-compute based workloads, single and double-precision, to get a better idea of how the Titan V compares to the Titan Xp and the Vega 64 / Frontier Edition cards in the workloads it is directly targeted at.
The NVIDIA Titan V Graphics Card
From a design perspective, the Titan V is mostly unchanged externally from the Titan Xp or GTX 1080 Ti implementations. We have a two-slot, blower-style cooler with the angular shroud design. The Titan V gets the champagne treatment for color, making it unique in that way.
The Titan V has a 250 watt TDP and requires an 8-pin and 6-pin power connector to operate. External output connections include 3x DisplayPort and 1x HDMI; again the same as other recent NVIDIA Founders Edition hardware.
Despite that fact that it looks the same, the cooler on the Titan V is new too. It is still a vapor chamber design but instead of using a copper base and aluminum fins, the fins on this cooler are all copper as well. This gives the card a definite heft increase, and was the first thing we noticed when taking it out the box in the office.
We already took the card apart in our first story and teardown video posted this week, so head over and visit that page for the footage if you are interested in seeing what this guy looks like underneath. Despite the fact that Volta GPUs have been around for some time, it is an impressive feat.
GPU Clock Consistency
One of the things we always check for on a new graphics card or partner card is the clock speed consistency. With GPU Boost modifying the clocks of the GPU in order to maintain power, thermal, and voltage thresholds, we like to see how the performance of the card changes and levels out during extended use. We take advantage of the looping benchmark of Unigine Heaven for our workload and use GPU-Z to monitor clocks and temps courtesy of the NVAPI.
Results are interesting – the Volta GPU on the Titan V starts well above the 1700 MHz mark but after just 2 minutes of run time we drop below the 1600 MHz mark and find the comfortable resting place. The average from this run, with the early higher clock rates included, comes to 1602 MHz, giving you an idea of here the Titan V will operate for gaming sessions.
Ryan they added an fps cap in
Ryan they added an fps cap in HITMAN GOTY patch at 100fps for all configs.Me and others have tried to message the devs but i dont know what they are doing.
the fps cap is not very stable and I have a theory but its irrelevant.
Seeing this thermal
Seeing this thermal throttling,I am also interested to know if the fan speed or the die itself is causing this poor cooling performance.This cooler is supposed to be a bit better than previous FE coolers which could hold 250w.(correct me if I am wrong)
This is not because of poor
This is not because of poor cooling performance. This is because the GPU has a much higher TDP than its predecessors and generates A LOT more heat. It’s not designed to be used for gaming.
These results show just how
These results show just how far behind AMD is lagging. If the die shrink of Vega doesn’t provide at least a 70% uplift, they’re dead next round.
That is only if Nvidia can
That is only if Nvidia can product the GV100 and yields (and volumes) that let it come close the consumer market.
I think its much more likely we might see a refresh of Pascal on the 12nm for gaming (this will still be a big boost) with more Cuda core due to the big power savings of the new prosses. The question here is will this be the same as volta in games? possibly.
But AMD is also scheduled to do a vega re-fresh on a new (lower power) prosses. This will reduce power consumption on vega quite a lot. Sure vega2 (or whatever the name will be) will not be beating a volta but very very very few gamers buy the top end cards so to say AMD is dead is a little pointless and blind. After all, I’m sure AMD sell a load of GPUs (in all of those consoles people buy) the majority of people don’t buy TI level GPUs so it is sort of ok for AMD to not target that market.
There’s not a chance in hell
There’s not a chance in hell we’ll see another Pascal release after what we have now. I can guarantee that 100%.
Unless you work in a position
Unless you work in a position that gives you power to make decisions about what Nvidia will do and/or own Nvidia, then you have absolutely %0 percent of garanteeing anything about what Nvidia sells or does not sell.
He is right though. No more
He is right though. No more Pascal is the reasonable conclusion. They have exhausted Pascal with the XP, Xp, and Quadro Pascal cards.
He is right though. No more
He is right though. No more Pascal is the reasonable conclusion. They have exhausted Pascal with the XP, Xp, and Quadro Pascal cards.
Seems like gaming efficiency
Seems like gaming efficiency gains of Volta can be attributed almost excursively to HBM. GDDR6 or HBM equipped Pascal plus some marketing spin will be enough for “next generation”.
Only HBM? I’m sure the 5160
Only HBM? I’m sure the 5160 shaders help somewhat too.
AMD can always do a dual GPU
AMD can always do a dual GPU die on one PCIe card configuration with Vega. Vega 20 is going to be even more DP FP heavy with a 1/2 DP FP to SP FP ratio. And Vega speaks the Infinity Fabric so any dual GPU dies on a single PCIe card configurations may not need to worry about any software/driver/API CF support as 2 GPU dies wired up via the Infinity Fabric IP would look to the software/drivers as a single monolithic logical GPU.
Look at how the Infinity fabric ties all those Zen/Zeppelin dies together on TR/Epyc and that part of Navi is already here. Navi is more about producing scalable GPUs from smaller GPU DIEs that can be wired up Infinity Fabric style to look like one big single GPU more than Navi is that much of a GPU micro-arch change over the Vega GPU micro-arch. Navi is more about that scalable Zen/Zeppelin sort of modular design taken to the next level and the Infinity Fabric IP is in all of AMD’s new Zen/Volta products currently.
So any Vega refresh dies on 12nm, including Vega 20 with is higher FP 64 number crunching will have already had the Infinity Fabric IP since the first Vega SKUs where introduced. And that gives AMD the option of wiring up some Dual GPU DIE on one PCIe card designs that can scale up and look to any software/driver just like a single bigger logical GPU.
AMD does not have to wait for Navi to go modular it’s just that Navi will be using more smaller GPU die chiplets that can be fabbed with very high yields and give AMD a finer grained ability to scale up GPU power from mobile to flagship using a smaller modular GPU common Die design.
That Radeon Pro Duo(Fiji XT) has 2× 4096:256:64 shaders:TMUs:ROPs for pleny of compute power and non gaming graphics rendering power. So maybe a dual Vega 64 or even a Dual Vega 20 for the professional markets that makes more use of the Infinity Fabric that the Fiji XT Radeon Pro Duo did not have the option of makeing use of.
96 ROPs for Titan V and a
96 ROPs for Titan V and a little more memory bandwidth over the the Titan Xp and a lot more shaders. Wikipedia lists the L2 cache size on the Titan V as 4608KB and the Titan Xp’s L2 as 4096 KB and the Titan Xp has 96 ROPs as does the Titan V have 96 ROPs. So is it Titan V’s HBM2 higher effective memory bandwidth and much wider HBM2 interface that is giving Titan V the most help in gaming or is it the larger cache on the Titan V relative to the Titan XP that is really helping keep the latency to a minimum. Titan V has more TMUs than the Titan Xp and those 320 TMUs on Titan V sure up Nvidia’s Texture Fill Rates even relative to AMD’s Vega Micro-Arch based Vega 64/56 SKUs.
Titan V’s shader counts are overkill for gaming and my money is on the Titan V’s larger L2 cache helping to lower the latency because Titan V’s ROP counts are the same as Titan Xp’s ROP counts. Titan V’s lower clocks base/boost clocks are more than made up for by other factors such as more shader cores/L2 cache and higher texture throughput. I’d like to see Titan V’s shader core utilization rates and that average closk rate is not too bad on Titan V and I wish there where some Titan Xp average clock rates for comparsion.
It looks like maybe the games are not needing the Shader counts as much as the games may be liking any extra L2 cache that Titan V can have available to keep and mamory access latency issues to a minimum. All that extra HBM2 effective bandwidth that the Titan V has over Titan XP has to count for some uplift over the GDDR5X used on the Titan Xp. And This is the first time HBM2 can be tested for gaming on any Nvidia GPU using gaming drivers and that has to count for some of Titan V’s performance delta over Titan Xp.
So the big question still remains as to just what extra ROP resources Nvidia will have on GV102 and GV104 based variants and just what higher clock speeds can be had on any GV104 based Volta variants that will very likely have the shader cores pruned back a good bit.
The ROP counts on any GV102/GV104 based variants will be interesting also as will be Nvidia use of VRAM memory(Gddr or HBM2) on its GV104 gaming variants. Even with all those extra shader cores that extra L2 cache on Titan V has to help.
Bad old Nvidia is requiring regrsitration to view the GV100 whitepapers, so that’s a big bummer.
But some other PDF online lists:
”
VOLTA GV100 SM
GV100
FP32 units 64
FP64 units 32
INT32 units 64
Tensor Cores 8
Register File 256 KB
Unified L1/Shared
memory
128 KB
Active Threads 2048
VOLTA GV100 SM
Completely new ISA
Twice the schedulers
Simplified Issue Logic
Large, fast L1 cache
Improved SIMT model
Tensor acceleration
=
The easiest SM to program yet
Redesigned for Productivity” (1)
(1)
“INSIDE VOLTA
Olivier Giroux and Luke Durant
NVIDIA
May 10, 2017”
http://on-demand.gputechconf.com/gtc/2017/presentation/s7798-luke-durant-inside-volta.pdf
“1700 MHz”
What? Surely you
“1700 MHz”
What? Surely you mean 17000 MHZ? Or else it’s 10x slower RAM than the Titan XP and 1080Ti.
No, he means 1700MHz.
It’s
No, he means 1700MHz.
It’s not slower. Titan V uses HBM2 which has a much wider bus than GDDR5X.
The 1080Ti has an 11008MHz memory clock on a 352-bit bus width, resulting in a memory bandwidth of 484GB/s
The Titan Xp has an 11408MHz memory clock on a 384-bit bus width, resulting in a memory bandwidth of 547.6GB/s
The Titan V has an 1700MHz memory clock on a 3072-bit bus width, resulting in a memory bandwidth of 652.8GB/s
Sorry, I totally didn’t
Sorry, I totally didn’t realize the 1080Ti and especially the Xp product don’t use HBM2 as well (and that HBM2 has a lower clock speed but much wider bus).
Yeah I hate it when people
Yeah I hate it when people uses MHz in wrong places. Clock speed for HBM2 in this thing is 850MHz(This is the real clock which one can overclock) and it can do two bits per clock thus 1.7Gbps, thus card’s bandwidth is 3*1.7Gbps*1024bit/(8 bit/Byte)= 652 GB/s
Edit: corrected memory freq.
800MHz and data on the
800MHz and data on the falling and rizing edge of the clock for a Dual Data Rate(DDR) of 1600MHz effective. The clock speed is in base 10 and the bandwidth is in base 2 units and do not forget any overhead and parity. And Each JEDEC standard HBM2 stack gets its own 1024 bit wide interface subdivided into 8, 128 bit independently operating channels. And for the JEDEC HBM2 standard Only, not HBM, HBM2 offers a 64 bit pseudo addresing mode where each 128 bit memory channel can be split into 2, 64 bit pseudo channels for finer grained memory access. Each HBM2 stack can have a total bandwidth of 256GB/S clocked at the maximum JEDEC speed.
According to Anandtech/SK Hynix the pseudo channel mode improves latency via optimized memory accesses:
“The second-generation HBM (HBM2) technology, which is outlined by the JESD235A standard, inherits physical 128-bit DDR interface with 2n prefetch architecture, internal organization, 1024-bit input/output, 1.2 V I/O and core voltages as well as all the crucial parts of the original tech. Just like the predecessor, HBM2 supports two, four or eight DRAM devices on a base logic die (2Hi, 4Hi, 8Hi stacks) per KGSD. HBM Gen 2 expands capacity of DRAM devices within a stack to 8 Gb and increases supported data-rates up to 1.6 Gb/s or even to 2 Gb/s per pin. In addition, the new technology brings an important improvement to maximize actual bandwidth.
One of the key enhancements of HBM2 is its Pseudo Channel mode, which divides a channel into two individual sub-channels of 64 bit I/O each, providing 128-bit prefetch per memory read and write access for each one. Pseudo channels operate at the same clock-rate, they share row and column command bus as well as CK and CKE inputs. However, they have separated banks, they decode and execute commands individually. SK Hynix says that the Pseudo Channel mode optimizes memory accesses and lowers latency, which results in higher effective bandwidth.
If, for some reason, an ASIC developer believes that Pseudo Channel mode is not optimal for their product, then HBM2 chips can also work in Legacy mode. While memory makers expect HBM2 to deliver higher effective bandwidth than predecessors, it depends on developers of memory controllers how efficient next-generation memory sub-systems will be. In any case, we will need to test actual hardware before we can confirm that HBM2 is better than HBM1 at the same clock-rate.” (1)
(1)
“JEDEC Publishes HBM2 Specification as Samsung Begins Mass Production of Chips”
https://www.anandtech.com/show/9969/jedec-publishes-hbm2-specification
Ryan, can you run with the
Ryan, can you run with the latest driver? 388.59? Thanks.
Oops, actually, we DID use
Oops, actually, we DID use 388.59, just updated the table.
You do ensure Fallout 4 is
You do ensure Fallout 4 is running in Fullscreen Exclusive Display Mode right? Every time you hit Okay in the configuration utility it will re-enable Borderless Fullscreen (and the option to turn it off in the utility is stupidly grayed out so you need to disable Borderless Fullscreen by editing the config file)
Really? Didn’t realize that,
Really? Didn't realize that, wonder if it will change my performance on those rare occasisons I get to play.
Sniper Elite 4 in
Sniper Elite 4 in DX11?
Thought it was one of the better Async-implementations – or were there Problems with Performance or Stability in DX12?
I was a little disappointed
I was a little disappointed in not seeing dx12 vs dx11 or even a Vulkan game like Wolfenstein 2. I knows it will blow away a Vega64 but its still interesting.
Why does the gap gets smaller
Why does the gap gets smaller at 4k? Shouldn’t it get bigger since it uses HBM?
That’s not how it works. You
That’s not how it works. You still have a set amount of ROPs and CUDA cores to do work. The only way Titan V is going to max out its memory is during HPC operations. My guess is that the 1180 Ti, etc. will all use GDDR5X or GDDR6, not HBM.
The performance is impressive
The performance is impressive as the card is. However, and I’m sure most would agree, we’d all like to see the performance of this card with a good air cooler or with water cooling, and not this underwhelming reference cooler.
Wonder how long until one of the big custom water cooling suppliers have a kit out for this card.
Why are the clock speeds for
Why are the clock speeds for RX Vega Liquid set to 1406 MHZ in the GTA V slides? That care does 1677 stock with a 1750 boost.
And Google’s TPU Verson 2
And Google’s TPU Verson 2 does FP 32 bit Tensor Tango at 45 TFlops.
“•Two cores, each with a 128×128 mixed multiply unit (MXU) and 8GB of high-bandwidth memory, adding up to 64GB of HBM for one four-chip device.
•600 GB/s memory bandwidth.
•32-bit floating-point precision math units for scalars and vectors, and 32-bit floating-point-precision matrix multiplication units with reduced precision for multipliers.
•Some 45 TFLOPS of max performance, adding up to 180 TFLOPS for one four-chip device.” (1)
(1)
“Google boffins tease custom AI math-chip TPU2 stats: 45 TFLOPS, 16GB HBM, benchmarks”
https://www.theregister.co.uk/2017/12/14/google_tpu2_specs_ish/
Wow the performance is
Wow the performance is dissappointing. Just 20% after 2 years. I guess this is what a lack of competition results in…
It needs more ROPs and lack
It needs more ROPs and lack of ROPs are why Vega is only just competing with the GTX1080. AMD needs to start an ROP increase crash plan and get more ROPs to push out as many FPS as possible. Doesn’t AMD realize by now that frame quality does not matter to gamers as much as frame flinging metrics. ROPs are what fling out those frame/FPS metrics that Bubba gamer likes, and Bubba gamer likes them FPS bragging rights more than any actual gaming. Just look at how much Bubba Gamer spends on making his Rig a showpiece like some pickup truck all dolled up to look like an 18 wheeler!
Bubba is in a drag race of ROPs against ROPs and he will pay top dollar for them FPS bragging rights. Ha ha ha, old JHH ain’t added any extra ROP’s this time around to Nvidia’s SKUs so that extra Frame Flinging is not so much above the previous generations SKUs. That GTX 2080 or GTX 1180/whatever thay call it Volta SKU based on the GV104 die better at least get 88 ROPs or it will not outperform the GTX 1080Ti with its 88 ROPs.
ROPs, ROPs, Bubba gamer loves them ROPs! Hey Vern look at my FPS matrics, dat’s top notch 20lbs golden belt buckle good! Dat’s dem ROP’s do’en all that frame flinging and I get more than you, he he haw! Hey Vern my gaming rigs got running lights and mud flaps, Yosemite Sam/Get Back mud flaps with LEDs on ol’ Sam’s belt buckel, yeehaw!
And me who has just acquired
And me who has just acquired a pair of Titan Xp Star Wars Edition, in order to realize soon a SLI (with a Core i9 7900x) …
Titan v vs. 2-way SLI Titan Xp: what would it give? Tests soon expected ?
Give headache
Give headache
For reals, you measure with
For reals, you measure with fraps and can’t even get the specs for the Vega right.
I trust these results.
5960X and X99 are pretty
5960X and X99 are pretty dated platform, hopefully we see some updates results with 8700K and OC as these results look like they are seeing a CPU/platform bottleneck.
Based on these results I
Based on these results I don’t think we will see any mainstream gaming Volta cards. They made a killing selling a tiny Pascal 300mm die chip as a high-end part due to lack of competition. A 300mm Volta card would only be marginally faster than the 1080 and not worth upgrading for most people. They need a 300m part that is 25-30% faster than the 1080 to maintain their huge margins, that chip will require a brand new architecture and a move to 10nm or 7nm.
388.71 are here and now
388.71 are here and now support Titan V officialy !
Where the 388.51 doesn’t.
Will the min framerate be better?