Hitman
Hitman (2016) (DirectX 12)
Hitman is a third-person stealth video game in which players take control of Agent 47, a genetically enhanced, superhuman assassin, travelling to international locations and eliminating contracted targets. As in other games in the Hitman series, players are given a large amount of room for creativity in approaching their assassinations.[5] For instance, players may utilize long-ranged rifles to snipe a target from a long distance, or they may decide to assassinate the target at close range by using blade weapons or garrote wire. Players can also use explosives, or disguise the assassination by creating a seemingly accidental death. –Wikipedia
Settings used for Hitman
I'm not sure what is going on with these results, but the testing was repeated multiple times on the Titan V and settings were checked like, 100 times. Something in this configuration is keeping the Titan V running at around 90 FPS in Hitman and its possible this is a unique bug in this iteration of the NVIDIA driver.
TITAN V 12GB, Average FPS Comparisons, Hitman | |||||
---|---|---|---|---|---|
Titan Xp 12GB | GTX 1080 Ti 11GB | GTX 1080 8GB | Vega 64 Liquid 8GB | ||
2560×1440 | -32% | -32% | -18% | -23% | |
3840×2160 | +14% | +17% | +50% | +45% |
This table presents the above data in a more basic way, focusing only on the average FPS, so keep that in mind.
Ryan they added an fps cap in
Ryan they added an fps cap in HITMAN GOTY patch at 100fps for all configs.Me and others have tried to message the devs but i dont know what they are doing.
the fps cap is not very stable and I have a theory but its irrelevant.
Seeing this thermal
Seeing this thermal throttling,I am also interested to know if the fan speed or the die itself is causing this poor cooling performance.This cooler is supposed to be a bit better than previous FE coolers which could hold 250w.(correct me if I am wrong)
This is not because of poor
This is not because of poor cooling performance. This is because the GPU has a much higher TDP than its predecessors and generates A LOT more heat. It’s not designed to be used for gaming.
These results show just how
These results show just how far behind AMD is lagging. If the die shrink of Vega doesn’t provide at least a 70% uplift, they’re dead next round.
That is only if Nvidia can
That is only if Nvidia can product the GV100 and yields (and volumes) that let it come close the consumer market.
I think its much more likely we might see a refresh of Pascal on the 12nm for gaming (this will still be a big boost) with more Cuda core due to the big power savings of the new prosses. The question here is will this be the same as volta in games? possibly.
But AMD is also scheduled to do a vega re-fresh on a new (lower power) prosses. This will reduce power consumption on vega quite a lot. Sure vega2 (or whatever the name will be) will not be beating a volta but very very very few gamers buy the top end cards so to say AMD is dead is a little pointless and blind. After all, I’m sure AMD sell a load of GPUs (in all of those consoles people buy) the majority of people don’t buy TI level GPUs so it is sort of ok for AMD to not target that market.
There’s not a chance in hell
There’s not a chance in hell we’ll see another Pascal release after what we have now. I can guarantee that 100%.
Unless you work in a position
Unless you work in a position that gives you power to make decisions about what Nvidia will do and/or own Nvidia, then you have absolutely %0 percent of garanteeing anything about what Nvidia sells or does not sell.
He is right though. No more
He is right though. No more Pascal is the reasonable conclusion. They have exhausted Pascal with the XP, Xp, and Quadro Pascal cards.
He is right though. No more
He is right though. No more Pascal is the reasonable conclusion. They have exhausted Pascal with the XP, Xp, and Quadro Pascal cards.
Seems like gaming efficiency
Seems like gaming efficiency gains of Volta can be attributed almost excursively to HBM. GDDR6 or HBM equipped Pascal plus some marketing spin will be enough for “next generation”.
Only HBM? I’m sure the 5160
Only HBM? I’m sure the 5160 shaders help somewhat too.
AMD can always do a dual GPU
AMD can always do a dual GPU die on one PCIe card configuration with Vega. Vega 20 is going to be even more DP FP heavy with a 1/2 DP FP to SP FP ratio. And Vega speaks the Infinity Fabric so any dual GPU dies on a single PCIe card configurations may not need to worry about any software/driver/API CF support as 2 GPU dies wired up via the Infinity Fabric IP would look to the software/drivers as a single monolithic logical GPU.
Look at how the Infinity fabric ties all those Zen/Zeppelin dies together on TR/Epyc and that part of Navi is already here. Navi is more about producing scalable GPUs from smaller GPU DIEs that can be wired up Infinity Fabric style to look like one big single GPU more than Navi is that much of a GPU micro-arch change over the Vega GPU micro-arch. Navi is more about that scalable Zen/Zeppelin sort of modular design taken to the next level and the Infinity Fabric IP is in all of AMD’s new Zen/Volta products currently.
So any Vega refresh dies on 12nm, including Vega 20 with is higher FP 64 number crunching will have already had the Infinity Fabric IP since the first Vega SKUs where introduced. And that gives AMD the option of wiring up some Dual GPU DIE on one PCIe card designs that can scale up and look to any software/driver just like a single bigger logical GPU.
AMD does not have to wait for Navi to go modular it’s just that Navi will be using more smaller GPU die chiplets that can be fabbed with very high yields and give AMD a finer grained ability to scale up GPU power from mobile to flagship using a smaller modular GPU common Die design.
That Radeon Pro Duo(Fiji XT) has 2× 4096:256:64 shaders:TMUs:ROPs for pleny of compute power and non gaming graphics rendering power. So maybe a dual Vega 64 or even a Dual Vega 20 for the professional markets that makes more use of the Infinity Fabric that the Fiji XT Radeon Pro Duo did not have the option of makeing use of.
96 ROPs for Titan V and a
96 ROPs for Titan V and a little more memory bandwidth over the the Titan Xp and a lot more shaders. Wikipedia lists the L2 cache size on the Titan V as 4608KB and the Titan Xp’s L2 as 4096 KB and the Titan Xp has 96 ROPs as does the Titan V have 96 ROPs. So is it Titan V’s HBM2 higher effective memory bandwidth and much wider HBM2 interface that is giving Titan V the most help in gaming or is it the larger cache on the Titan V relative to the Titan XP that is really helping keep the latency to a minimum. Titan V has more TMUs than the Titan Xp and those 320 TMUs on Titan V sure up Nvidia’s Texture Fill Rates even relative to AMD’s Vega Micro-Arch based Vega 64/56 SKUs.
Titan V’s shader counts are overkill for gaming and my money is on the Titan V’s larger L2 cache helping to lower the latency because Titan V’s ROP counts are the same as Titan Xp’s ROP counts. Titan V’s lower clocks base/boost clocks are more than made up for by other factors such as more shader cores/L2 cache and higher texture throughput. I’d like to see Titan V’s shader core utilization rates and that average closk rate is not too bad on Titan V and I wish there where some Titan Xp average clock rates for comparsion.
It looks like maybe the games are not needing the Shader counts as much as the games may be liking any extra L2 cache that Titan V can have available to keep and mamory access latency issues to a minimum. All that extra HBM2 effective bandwidth that the Titan V has over Titan XP has to count for some uplift over the GDDR5X used on the Titan Xp. And This is the first time HBM2 can be tested for gaming on any Nvidia GPU using gaming drivers and that has to count for some of Titan V’s performance delta over Titan Xp.
So the big question still remains as to just what extra ROP resources Nvidia will have on GV102 and GV104 based variants and just what higher clock speeds can be had on any GV104 based Volta variants that will very likely have the shader cores pruned back a good bit.
The ROP counts on any GV102/GV104 based variants will be interesting also as will be Nvidia use of VRAM memory(Gddr or HBM2) on its GV104 gaming variants. Even with all those extra shader cores that extra L2 cache on Titan V has to help.
Bad old Nvidia is requiring regrsitration to view the GV100 whitepapers, so that’s a big bummer.
But some other PDF online lists:
”
VOLTA GV100 SM
GV100
FP32 units 64
FP64 units 32
INT32 units 64
Tensor Cores 8
Register File 256 KB
Unified L1/Shared
memory
128 KB
Active Threads 2048
VOLTA GV100 SM
Completely new ISA
Twice the schedulers
Simplified Issue Logic
Large, fast L1 cache
Improved SIMT model
Tensor acceleration
=
The easiest SM to program yet
Redesigned for Productivity” (1)
(1)
“INSIDE VOLTA
Olivier Giroux and Luke Durant
NVIDIA
May 10, 2017”
http://on-demand.gputechconf.com/gtc/2017/presentation/s7798-luke-durant-inside-volta.pdf
“1700 MHz”
What? Surely you
“1700 MHz”
What? Surely you mean 17000 MHZ? Or else it’s 10x slower RAM than the Titan XP and 1080Ti.
No, he means 1700MHz.
It’s
No, he means 1700MHz.
It’s not slower. Titan V uses HBM2 which has a much wider bus than GDDR5X.
The 1080Ti has an 11008MHz memory clock on a 352-bit bus width, resulting in a memory bandwidth of 484GB/s
The Titan Xp has an 11408MHz memory clock on a 384-bit bus width, resulting in a memory bandwidth of 547.6GB/s
The Titan V has an 1700MHz memory clock on a 3072-bit bus width, resulting in a memory bandwidth of 652.8GB/s
Sorry, I totally didn’t
Sorry, I totally didn’t realize the 1080Ti and especially the Xp product don’t use HBM2 as well (and that HBM2 has a lower clock speed but much wider bus).
Yeah I hate it when people
Yeah I hate it when people uses MHz in wrong places. Clock speed for HBM2 in this thing is 850MHz(This is the real clock which one can overclock) and it can do two bits per clock thus 1.7Gbps, thus card’s bandwidth is 3*1.7Gbps*1024bit/(8 bit/Byte)= 652 GB/s
Edit: corrected memory freq.
800MHz and data on the
800MHz and data on the falling and rizing edge of the clock for a Dual Data Rate(DDR) of 1600MHz effective. The clock speed is in base 10 and the bandwidth is in base 2 units and do not forget any overhead and parity. And Each JEDEC standard HBM2 stack gets its own 1024 bit wide interface subdivided into 8, 128 bit independently operating channels. And for the JEDEC HBM2 standard Only, not HBM, HBM2 offers a 64 bit pseudo addresing mode where each 128 bit memory channel can be split into 2, 64 bit pseudo channels for finer grained memory access. Each HBM2 stack can have a total bandwidth of 256GB/S clocked at the maximum JEDEC speed.
According to Anandtech/SK Hynix the pseudo channel mode improves latency via optimized memory accesses:
“The second-generation HBM (HBM2) technology, which is outlined by the JESD235A standard, inherits physical 128-bit DDR interface with 2n prefetch architecture, internal organization, 1024-bit input/output, 1.2 V I/O and core voltages as well as all the crucial parts of the original tech. Just like the predecessor, HBM2 supports two, four or eight DRAM devices on a base logic die (2Hi, 4Hi, 8Hi stacks) per KGSD. HBM Gen 2 expands capacity of DRAM devices within a stack to 8 Gb and increases supported data-rates up to 1.6 Gb/s or even to 2 Gb/s per pin. In addition, the new technology brings an important improvement to maximize actual bandwidth.
One of the key enhancements of HBM2 is its Pseudo Channel mode, which divides a channel into two individual sub-channels of 64 bit I/O each, providing 128-bit prefetch per memory read and write access for each one. Pseudo channels operate at the same clock-rate, they share row and column command bus as well as CK and CKE inputs. However, they have separated banks, they decode and execute commands individually. SK Hynix says that the Pseudo Channel mode optimizes memory accesses and lowers latency, which results in higher effective bandwidth.
If, for some reason, an ASIC developer believes that Pseudo Channel mode is not optimal for their product, then HBM2 chips can also work in Legacy mode. While memory makers expect HBM2 to deliver higher effective bandwidth than predecessors, it depends on developers of memory controllers how efficient next-generation memory sub-systems will be. In any case, we will need to test actual hardware before we can confirm that HBM2 is better than HBM1 at the same clock-rate.” (1)
(1)
“JEDEC Publishes HBM2 Specification as Samsung Begins Mass Production of Chips”
https://www.anandtech.com/show/9969/jedec-publishes-hbm2-specification
Ryan, can you run with the
Ryan, can you run with the latest driver? 388.59? Thanks.
Oops, actually, we DID use
Oops, actually, we DID use 388.59, just updated the table.
You do ensure Fallout 4 is
You do ensure Fallout 4 is running in Fullscreen Exclusive Display Mode right? Every time you hit Okay in the configuration utility it will re-enable Borderless Fullscreen (and the option to turn it off in the utility is stupidly grayed out so you need to disable Borderless Fullscreen by editing the config file)
Really? Didn’t realize that,
Really? Didn't realize that, wonder if it will change my performance on those rare occasisons I get to play.
Sniper Elite 4 in
Sniper Elite 4 in DX11?
Thought it was one of the better Async-implementations – or were there Problems with Performance or Stability in DX12?
I was a little disappointed
I was a little disappointed in not seeing dx12 vs dx11 or even a Vulkan game like Wolfenstein 2. I knows it will blow away a Vega64 but its still interesting.
Why does the gap gets smaller
Why does the gap gets smaller at 4k? Shouldn’t it get bigger since it uses HBM?
That’s not how it works. You
That’s not how it works. You still have a set amount of ROPs and CUDA cores to do work. The only way Titan V is going to max out its memory is during HPC operations. My guess is that the 1180 Ti, etc. will all use GDDR5X or GDDR6, not HBM.
The performance is impressive
The performance is impressive as the card is. However, and I’m sure most would agree, we’d all like to see the performance of this card with a good air cooler or with water cooling, and not this underwhelming reference cooler.
Wonder how long until one of the big custom water cooling suppliers have a kit out for this card.
Why are the clock speeds for
Why are the clock speeds for RX Vega Liquid set to 1406 MHZ in the GTA V slides? That care does 1677 stock with a 1750 boost.
And Google’s TPU Verson 2
And Google’s TPU Verson 2 does FP 32 bit Tensor Tango at 45 TFlops.
“•Two cores, each with a 128×128 mixed multiply unit (MXU) and 8GB of high-bandwidth memory, adding up to 64GB of HBM for one four-chip device.
•600 GB/s memory bandwidth.
•32-bit floating-point precision math units for scalars and vectors, and 32-bit floating-point-precision matrix multiplication units with reduced precision for multipliers.
•Some 45 TFLOPS of max performance, adding up to 180 TFLOPS for one four-chip device.” (1)
(1)
“Google boffins tease custom AI math-chip TPU2 stats: 45 TFLOPS, 16GB HBM, benchmarks”
https://www.theregister.co.uk/2017/12/14/google_tpu2_specs_ish/
Wow the performance is
Wow the performance is dissappointing. Just 20% after 2 years. I guess this is what a lack of competition results in…
It needs more ROPs and lack
It needs more ROPs and lack of ROPs are why Vega is only just competing with the GTX1080. AMD needs to start an ROP increase crash plan and get more ROPs to push out as many FPS as possible. Doesn’t AMD realize by now that frame quality does not matter to gamers as much as frame flinging metrics. ROPs are what fling out those frame/FPS metrics that Bubba gamer likes, and Bubba gamer likes them FPS bragging rights more than any actual gaming. Just look at how much Bubba Gamer spends on making his Rig a showpiece like some pickup truck all dolled up to look like an 18 wheeler!
Bubba is in a drag race of ROPs against ROPs and he will pay top dollar for them FPS bragging rights. Ha ha ha, old JHH ain’t added any extra ROP’s this time around to Nvidia’s SKUs so that extra Frame Flinging is not so much above the previous generations SKUs. That GTX 2080 or GTX 1180/whatever thay call it Volta SKU based on the GV104 die better at least get 88 ROPs or it will not outperform the GTX 1080Ti with its 88 ROPs.
ROPs, ROPs, Bubba gamer loves them ROPs! Hey Vern look at my FPS matrics, dat’s top notch 20lbs golden belt buckle good! Dat’s dem ROP’s do’en all that frame flinging and I get more than you, he he haw! Hey Vern my gaming rigs got running lights and mud flaps, Yosemite Sam/Get Back mud flaps with LEDs on ol’ Sam’s belt buckel, yeehaw!
And me who has just acquired
And me who has just acquired a pair of Titan Xp Star Wars Edition, in order to realize soon a SLI (with a Core i9 7900x) …
Titan v vs. 2-way SLI Titan Xp: what would it give? Tests soon expected ?
Give headache
Give headache
For reals, you measure with
For reals, you measure with fraps and can’t even get the specs for the Vega right.
I trust these results.
5960X and X99 are pretty
5960X and X99 are pretty dated platform, hopefully we see some updates results with 8700K and OC as these results look like they are seeing a CPU/platform bottleneck.
Based on these results I
Based on these results I don’t think we will see any mainstream gaming Volta cards. They made a killing selling a tiny Pascal 300mm die chip as a high-end part due to lack of competition. A 300mm Volta card would only be marginally faster than the 1080 and not worth upgrading for most people. They need a 300m part that is 25-30% faster than the 1080 to maintain their huge margins, that chip will require a brand new architecture and a move to 10nm or 7nm.
388.71 are here and now
388.71 are here and now support Titan V officialy !
Where the 388.51 doesn’t.
Will the min framerate be better?