Testing Suite and Methodology Update
If you have followed our graphics testing at PC Perspective you’ll know about a drastic shift we made in 2012 to support a technology we called Frame Rating. Frame Rating use the direct capture of output from the system into uncompressed video files and FCAT-style scripts to analyze the video to produce statistics including frame rates, frame times, frame time variance and game smoothness.
Readers and listeners might have also heard about the issues surrounding the move to DirectX 12 and UWP (Unified Windows Platform) and how it affected our testing methods. Our benchmarking process depends on a secondary application running in the background on the tested PC that draws colored overlays along the left-hand side of the screen in a repeating pattern to help us measure performance after the fact. The overlay we have been using supported DirectX 9, 10 and 11, but didn’t work with DX12 or UWP games.
We worked with NVIDIA to fix that and we have an overlay that behaves exactly in the same way as before, but it now will let us properly measure performance and smoothness on DX12 and UWP games. This is a big step to maintaining the detailed analytics of game performance that enable us to push both game developers and hardware vendors to perfect their products and create the best possible gaming experiences for consumers.
So, as a result, our testing suite has been upgraded with a new collection of games and tests. Included in this review are the following:
- 3DMark Fire Strike Extreme and Ultra
- Unigine Heaven 4.0
- Dirt Rally (DX11)
- Fallout 4 (DX11)
- Grand Theft Auto V (DX11)
- Hellblade (DX11)
- Hitman (DX12)
- Rise of the Tomb Raider (DX12)
- Sniper Elite 4 (DX11)
- The Witcher 3 (DX11)
We have included racing games, third person, first person, DX11, DX12, and some synthetics, going for a mix that I think encapsulates the gaming market of today and the future as best as possible. Hopefully we can finally end the bickering in comments about not using DX12 titles in our GPU reviews! (Ha, right.)
Our GPU testbed remains unchanged, including an 8-core Haswell-E processor and plenty of memory and storage.
PC Perspective GPU Testbed | |
---|---|
Processor | Intel Core i7-5960X Haswell-E |
Motherboard | ASUS Rampage V Extreme X99 |
Memory | G.Skill Ripjaws 16GB DDR4-3200 |
Storage | OCZ Agility 4 256GB (OS) Adata SP610 500GB (games) |
Power Supply | Corsair AX1500i 1500 watt |
OS | Windows 10 x64 |
Drivers | AMD: 17.10.2 NVIDIA: 388.59 |
For those of you that have never read about our Frame Rating capture-based performance analysis system, the following section is for you. If you have, feel free to jump straight into the benchmark action!!
Frame Rating: Our Testing Process
If you aren't familiar with it, you should probably do a little research into our testing methodology as it is quite different than others you may see online. Rather than using FRAPS to measure frame rates or frame times, we are using a secondary PC to capture the output from the tested graphics card directly and then use post processing on the resulting video to determine frame rates, frame times, frame variance and much more.
This amount of data can be pretty confusing if you attempting to read it without proper background, but I strongly believe that the results we present paint a much more thorough picture of performance than other options. So please, read up on the full discussion about our Frame Rating methods before moving forward!!
While there are literally dozens of files created for each “run” of benchmarks, there are several resulting graphs that FCAT produces, as well as several more that we are generating with additional code of our own.
The PCPER FRAPS File
Previous example data
While the graphs above are produced by the default version of the scripts from NVIDIA, I have modified and added to them in a few ways to produce additional data for our readers. The first file shows a sub-set of the data from the RUN file above, the average frame rate over time as defined by FRAPS, though we are combining all of the GPUs we are comparing into a single graph. This will basically emulate the data we have been showing you for the past several years.
The PCPER Observed FPS File
Previous example data
This graph takes a different subset of data points and plots them similarly to the FRAPS file above, but this time we are looking at the “observed” average frame rates, shown previously as the blue bars in the RUN file above. This takes out the dropped and runts frames, giving you the performance metrics that actually matter – how many frames are being shown to the gamer to improve the animation sequences.
As you’ll see in our full results on the coming pages, seeing a big difference between the FRAPS FPS graphic and the Observed FPS will indicate cases where it is likely the gamer is not getting the full benefit of the hardware investment in their PC.
The PLOT File
Previous example data
The primary file that is generated from the extracted data is a plot of calculated frame times including runts. The numbers here represent the amount of time that frames appear on the screen for the user, a “thinner” line across the time span represents frame times that are consistent and thus should produce the smoothest animation to the gamer. A “wider” line or one with a lot of peaks and valleys indicates a lot more variance and is likely caused by a lot of runts being displayed.
The RUN File
While the two graphs above show combined results for a set of cards being compared, the RUN file will show you the results from a single card on that particular result. It is in this graph that you can see interesting data about runts, drops, average frame rate and the actual frame rate of your gaming experience.
Previous example data
For tests that show no runts or drops, the data is pretty clean. This is the standard frame rate per second over a span of time graph that has become the standard for performance evaluation on graphics cards.
Previous example data
A test that does have runts and drops will look much different. The black bar labeled FRAPS indicates the average frame rate over time that traditional testing would show if you counted the drops and runts in the equation – as FRAPS FPS measurement does. Any area in red is a dropped frame – the wider the amount of red you see, the more colored bars from our overlay were missing in the captured video file, indicating the gamer never saw those frames in any form.
The wide yellow area is the representation of runts, the thin bands of color in our captured video, that we have determined do not add to the animation of the image on the screen. The larger the area of yellow the more often those runts are appearing.
Finally, the blue line is the measured FPS over each second after removing the runts and drops. We are going to be calling this metric the “observed frame rate” as it measures the actual speed of the animation that the gamer experiences.
The PERcentile File
Previous example data
Scott introduced the idea of frame time percentiles months ago but now that we have some different data using direct capture as opposed to FRAPS, the results might be even more telling. In this case, FCAT is showing percentiles not by frame time but instead by instantaneous FPS. This will tell you the minimum frame rate that will appear on the screen at any given percent of the time during our benchmark run. The 50th percentile should be very close to the average total frame rate of the benchmark but as we creep closer to the 100% we see how the frame rate will be affected.
The closer this line is to being perfectly flat the better as that would mean we are running at a constant frame rate the entire time. A steep decline on the right-hand side tells us that frame times are varying more and more frequently and might indicate potential stutter in the animation.
The PCPER Frame Time Variance File
Of all the data we are presenting, this is probably the one that needs the most discussion. In an attempt to create a new metric for gaming and graphics performance, I wanted to try to find a way to define stutter based on the data sets we had collected. As I mentioned earlier, we can define a single stutter as a variance level between t_game and t_display. This variance can be introduced in t_game, t_display, or on both levels. Since we can currently only reliably test the t_display rate, how can we create a definition of stutter that makes sense and that can be applied across multiple games and platforms?
We define a single frame variance as the difference between the current frame time and the previous frame time – how consistent the two frames presented to the gamer. However, as I found in my testing plotting the value of this frame variance is nearly a perfect match to the data presented by the minimum FPS (PER) file created by FCAT. To be more specific, stutter is only perceived when there is a break from the previous animation frame rates.
Our current running theory for a stutter evaluation is this: find the current frame time variance by comparing the current frame time to the running average of the frame times of the previous 20 frames. Then, by sorting these frame times and plotting them in a percentile form we can get an interesting look at potential stutter. Comparing the frame times to a running average rather than just to the previous frame should prevent potential problems from legitimate performance peaks or valleys found when moving from a highly compute intensive scene to a lower one.
Previous example data
While we are still trying to figure out if this is the best way to visualize stutter in a game, we have seen enough evidence in our game play testing and by comparing the above graphic to other data generated through our Frame Rating system to be reasonably confident in our assertions. So much in fact that I am going to call this data the PCPER ISU, which beer fans will appreciate as the acronym of International Stutter Units.
To compare these results you want to see a line that is as close the 0ms mark as possible indicating very little frame rate variance when compared to a running average of previous frames. There will be some inevitable incline as we reach the 90+ percentile but that is expected with any game play sequence that varies from scene to scene. What we do not want to see is a sharper line up that would indicate higher frame variance (ISU) and could be an indication that the game sees microstuttering and hitching problems.
Ryan they added an fps cap in
Ryan they added an fps cap in HITMAN GOTY patch at 100fps for all configs.Me and others have tried to message the devs but i dont know what they are doing.
the fps cap is not very stable and I have a theory but its irrelevant.
Seeing this thermal
Seeing this thermal throttling,I am also interested to know if the fan speed or the die itself is causing this poor cooling performance.This cooler is supposed to be a bit better than previous FE coolers which could hold 250w.(correct me if I am wrong)
This is not because of poor
This is not because of poor cooling performance. This is because the GPU has a much higher TDP than its predecessors and generates A LOT more heat. It’s not designed to be used for gaming.
These results show just how
These results show just how far behind AMD is lagging. If the die shrink of Vega doesn’t provide at least a 70% uplift, they’re dead next round.
That is only if Nvidia can
That is only if Nvidia can product the GV100 and yields (and volumes) that let it come close the consumer market.
I think its much more likely we might see a refresh of Pascal on the 12nm for gaming (this will still be a big boost) with more Cuda core due to the big power savings of the new prosses. The question here is will this be the same as volta in games? possibly.
But AMD is also scheduled to do a vega re-fresh on a new (lower power) prosses. This will reduce power consumption on vega quite a lot. Sure vega2 (or whatever the name will be) will not be beating a volta but very very very few gamers buy the top end cards so to say AMD is dead is a little pointless and blind. After all, I’m sure AMD sell a load of GPUs (in all of those consoles people buy) the majority of people don’t buy TI level GPUs so it is sort of ok for AMD to not target that market.
There’s not a chance in hell
There’s not a chance in hell we’ll see another Pascal release after what we have now. I can guarantee that 100%.
Unless you work in a position
Unless you work in a position that gives you power to make decisions about what Nvidia will do and/or own Nvidia, then you have absolutely %0 percent of garanteeing anything about what Nvidia sells or does not sell.
He is right though. No more
He is right though. No more Pascal is the reasonable conclusion. They have exhausted Pascal with the XP, Xp, and Quadro Pascal cards.
He is right though. No more
He is right though. No more Pascal is the reasonable conclusion. They have exhausted Pascal with the XP, Xp, and Quadro Pascal cards.
Seems like gaming efficiency
Seems like gaming efficiency gains of Volta can be attributed almost excursively to HBM. GDDR6 or HBM equipped Pascal plus some marketing spin will be enough for “next generation”.
Only HBM? I’m sure the 5160
Only HBM? I’m sure the 5160 shaders help somewhat too.
AMD can always do a dual GPU
AMD can always do a dual GPU die on one PCIe card configuration with Vega. Vega 20 is going to be even more DP FP heavy with a 1/2 DP FP to SP FP ratio. And Vega speaks the Infinity Fabric so any dual GPU dies on a single PCIe card configurations may not need to worry about any software/driver/API CF support as 2 GPU dies wired up via the Infinity Fabric IP would look to the software/drivers as a single monolithic logical GPU.
Look at how the Infinity fabric ties all those Zen/Zeppelin dies together on TR/Epyc and that part of Navi is already here. Navi is more about producing scalable GPUs from smaller GPU DIEs that can be wired up Infinity Fabric style to look like one big single GPU more than Navi is that much of a GPU micro-arch change over the Vega GPU micro-arch. Navi is more about that scalable Zen/Zeppelin sort of modular design taken to the next level and the Infinity Fabric IP is in all of AMD’s new Zen/Volta products currently.
So any Vega refresh dies on 12nm, including Vega 20 with is higher FP 64 number crunching will have already had the Infinity Fabric IP since the first Vega SKUs where introduced. And that gives AMD the option of wiring up some Dual GPU DIE on one PCIe card designs that can scale up and look to any software/driver just like a single bigger logical GPU.
AMD does not have to wait for Navi to go modular it’s just that Navi will be using more smaller GPU die chiplets that can be fabbed with very high yields and give AMD a finer grained ability to scale up GPU power from mobile to flagship using a smaller modular GPU common Die design.
That Radeon Pro Duo(Fiji XT) has 2× 4096:256:64 shaders:TMUs:ROPs for pleny of compute power and non gaming graphics rendering power. So maybe a dual Vega 64 or even a Dual Vega 20 for the professional markets that makes more use of the Infinity Fabric that the Fiji XT Radeon Pro Duo did not have the option of makeing use of.
96 ROPs for Titan V and a
96 ROPs for Titan V and a little more memory bandwidth over the the Titan Xp and a lot more shaders. Wikipedia lists the L2 cache size on the Titan V as 4608KB and the Titan Xp’s L2 as 4096 KB and the Titan Xp has 96 ROPs as does the Titan V have 96 ROPs. So is it Titan V’s HBM2 higher effective memory bandwidth and much wider HBM2 interface that is giving Titan V the most help in gaming or is it the larger cache on the Titan V relative to the Titan XP that is really helping keep the latency to a minimum. Titan V has more TMUs than the Titan Xp and those 320 TMUs on Titan V sure up Nvidia’s Texture Fill Rates even relative to AMD’s Vega Micro-Arch based Vega 64/56 SKUs.
Titan V’s shader counts are overkill for gaming and my money is on the Titan V’s larger L2 cache helping to lower the latency because Titan V’s ROP counts are the same as Titan Xp’s ROP counts. Titan V’s lower clocks base/boost clocks are more than made up for by other factors such as more shader cores/L2 cache and higher texture throughput. I’d like to see Titan V’s shader core utilization rates and that average closk rate is not too bad on Titan V and I wish there where some Titan Xp average clock rates for comparsion.
It looks like maybe the games are not needing the Shader counts as much as the games may be liking any extra L2 cache that Titan V can have available to keep and mamory access latency issues to a minimum. All that extra HBM2 effective bandwidth that the Titan V has over Titan XP has to count for some uplift over the GDDR5X used on the Titan Xp. And This is the first time HBM2 can be tested for gaming on any Nvidia GPU using gaming drivers and that has to count for some of Titan V’s performance delta over Titan Xp.
So the big question still remains as to just what extra ROP resources Nvidia will have on GV102 and GV104 based variants and just what higher clock speeds can be had on any GV104 based Volta variants that will very likely have the shader cores pruned back a good bit.
The ROP counts on any GV102/GV104 based variants will be interesting also as will be Nvidia use of VRAM memory(Gddr or HBM2) on its GV104 gaming variants. Even with all those extra shader cores that extra L2 cache on Titan V has to help.
Bad old Nvidia is requiring regrsitration to view the GV100 whitepapers, so that’s a big bummer.
But some other PDF online lists:
”
VOLTA GV100 SM
GV100
FP32 units 64
FP64 units 32
INT32 units 64
Tensor Cores 8
Register File 256 KB
Unified L1/Shared
memory
128 KB
Active Threads 2048
VOLTA GV100 SM
Completely new ISA
Twice the schedulers
Simplified Issue Logic
Large, fast L1 cache
Improved SIMT model
Tensor acceleration
=
The easiest SM to program yet
Redesigned for Productivity” (1)
(1)
“INSIDE VOLTA
Olivier Giroux and Luke Durant
NVIDIA
May 10, 2017”
http://on-demand.gputechconf.com/gtc/2017/presentation/s7798-luke-durant-inside-volta.pdf
“1700 MHz”
What? Surely you
“1700 MHz”
What? Surely you mean 17000 MHZ? Or else it’s 10x slower RAM than the Titan XP and 1080Ti.
No, he means 1700MHz.
It’s
No, he means 1700MHz.
It’s not slower. Titan V uses HBM2 which has a much wider bus than GDDR5X.
The 1080Ti has an 11008MHz memory clock on a 352-bit bus width, resulting in a memory bandwidth of 484GB/s
The Titan Xp has an 11408MHz memory clock on a 384-bit bus width, resulting in a memory bandwidth of 547.6GB/s
The Titan V has an 1700MHz memory clock on a 3072-bit bus width, resulting in a memory bandwidth of 652.8GB/s
Sorry, I totally didn’t
Sorry, I totally didn’t realize the 1080Ti and especially the Xp product don’t use HBM2 as well (and that HBM2 has a lower clock speed but much wider bus).
Yeah I hate it when people
Yeah I hate it when people uses MHz in wrong places. Clock speed for HBM2 in this thing is 850MHz(This is the real clock which one can overclock) and it can do two bits per clock thus 1.7Gbps, thus card’s bandwidth is 3*1.7Gbps*1024bit/(8 bit/Byte)= 652 GB/s
Edit: corrected memory freq.
800MHz and data on the
800MHz and data on the falling and rizing edge of the clock for a Dual Data Rate(DDR) of 1600MHz effective. The clock speed is in base 10 and the bandwidth is in base 2 units and do not forget any overhead and parity. And Each JEDEC standard HBM2 stack gets its own 1024 bit wide interface subdivided into 8, 128 bit independently operating channels. And for the JEDEC HBM2 standard Only, not HBM, HBM2 offers a 64 bit pseudo addresing mode where each 128 bit memory channel can be split into 2, 64 bit pseudo channels for finer grained memory access. Each HBM2 stack can have a total bandwidth of 256GB/S clocked at the maximum JEDEC speed.
According to Anandtech/SK Hynix the pseudo channel mode improves latency via optimized memory accesses:
“The second-generation HBM (HBM2) technology, which is outlined by the JESD235A standard, inherits physical 128-bit DDR interface with 2n prefetch architecture, internal organization, 1024-bit input/output, 1.2 V I/O and core voltages as well as all the crucial parts of the original tech. Just like the predecessor, HBM2 supports two, four or eight DRAM devices on a base logic die (2Hi, 4Hi, 8Hi stacks) per KGSD. HBM Gen 2 expands capacity of DRAM devices within a stack to 8 Gb and increases supported data-rates up to 1.6 Gb/s or even to 2 Gb/s per pin. In addition, the new technology brings an important improvement to maximize actual bandwidth.
One of the key enhancements of HBM2 is its Pseudo Channel mode, which divides a channel into two individual sub-channels of 64 bit I/O each, providing 128-bit prefetch per memory read and write access for each one. Pseudo channels operate at the same clock-rate, they share row and column command bus as well as CK and CKE inputs. However, they have separated banks, they decode and execute commands individually. SK Hynix says that the Pseudo Channel mode optimizes memory accesses and lowers latency, which results in higher effective bandwidth.
If, for some reason, an ASIC developer believes that Pseudo Channel mode is not optimal for their product, then HBM2 chips can also work in Legacy mode. While memory makers expect HBM2 to deliver higher effective bandwidth than predecessors, it depends on developers of memory controllers how efficient next-generation memory sub-systems will be. In any case, we will need to test actual hardware before we can confirm that HBM2 is better than HBM1 at the same clock-rate.” (1)
(1)
“JEDEC Publishes HBM2 Specification as Samsung Begins Mass Production of Chips”
https://www.anandtech.com/show/9969/jedec-publishes-hbm2-specification
Ryan, can you run with the
Ryan, can you run with the latest driver? 388.59? Thanks.
Oops, actually, we DID use
Oops, actually, we DID use 388.59, just updated the table.
You do ensure Fallout 4 is
You do ensure Fallout 4 is running in Fullscreen Exclusive Display Mode right? Every time you hit Okay in the configuration utility it will re-enable Borderless Fullscreen (and the option to turn it off in the utility is stupidly grayed out so you need to disable Borderless Fullscreen by editing the config file)
Really? Didn’t realize that,
Really? Didn't realize that, wonder if it will change my performance on those rare occasisons I get to play.
Sniper Elite 4 in
Sniper Elite 4 in DX11?
Thought it was one of the better Async-implementations – or were there Problems with Performance or Stability in DX12?
I was a little disappointed
I was a little disappointed in not seeing dx12 vs dx11 or even a Vulkan game like Wolfenstein 2. I knows it will blow away a Vega64 but its still interesting.
Why does the gap gets smaller
Why does the gap gets smaller at 4k? Shouldn’t it get bigger since it uses HBM?
That’s not how it works. You
That’s not how it works. You still have a set amount of ROPs and CUDA cores to do work. The only way Titan V is going to max out its memory is during HPC operations. My guess is that the 1180 Ti, etc. will all use GDDR5X or GDDR6, not HBM.
The performance is impressive
The performance is impressive as the card is. However, and I’m sure most would agree, we’d all like to see the performance of this card with a good air cooler or with water cooling, and not this underwhelming reference cooler.
Wonder how long until one of the big custom water cooling suppliers have a kit out for this card.
Why are the clock speeds for
Why are the clock speeds for RX Vega Liquid set to 1406 MHZ in the GTA V slides? That care does 1677 stock with a 1750 boost.
And Google’s TPU Verson 2
And Google’s TPU Verson 2 does FP 32 bit Tensor Tango at 45 TFlops.
“•Two cores, each with a 128×128 mixed multiply unit (MXU) and 8GB of high-bandwidth memory, adding up to 64GB of HBM for one four-chip device.
•600 GB/s memory bandwidth.
•32-bit floating-point precision math units for scalars and vectors, and 32-bit floating-point-precision matrix multiplication units with reduced precision for multipliers.
•Some 45 TFLOPS of max performance, adding up to 180 TFLOPS for one four-chip device.” (1)
(1)
“Google boffins tease custom AI math-chip TPU2 stats: 45 TFLOPS, 16GB HBM, benchmarks”
https://www.theregister.co.uk/2017/12/14/google_tpu2_specs_ish/
Wow the performance is
Wow the performance is dissappointing. Just 20% after 2 years. I guess this is what a lack of competition results in…
It needs more ROPs and lack
It needs more ROPs and lack of ROPs are why Vega is only just competing with the GTX1080. AMD needs to start an ROP increase crash plan and get more ROPs to push out as many FPS as possible. Doesn’t AMD realize by now that frame quality does not matter to gamers as much as frame flinging metrics. ROPs are what fling out those frame/FPS metrics that Bubba gamer likes, and Bubba gamer likes them FPS bragging rights more than any actual gaming. Just look at how much Bubba Gamer spends on making his Rig a showpiece like some pickup truck all dolled up to look like an 18 wheeler!
Bubba is in a drag race of ROPs against ROPs and he will pay top dollar for them FPS bragging rights. Ha ha ha, old JHH ain’t added any extra ROP’s this time around to Nvidia’s SKUs so that extra Frame Flinging is not so much above the previous generations SKUs. That GTX 2080 or GTX 1180/whatever thay call it Volta SKU based on the GV104 die better at least get 88 ROPs or it will not outperform the GTX 1080Ti with its 88 ROPs.
ROPs, ROPs, Bubba gamer loves them ROPs! Hey Vern look at my FPS matrics, dat’s top notch 20lbs golden belt buckle good! Dat’s dem ROP’s do’en all that frame flinging and I get more than you, he he haw! Hey Vern my gaming rigs got running lights and mud flaps, Yosemite Sam/Get Back mud flaps with LEDs on ol’ Sam’s belt buckel, yeehaw!
And me who has just acquired
And me who has just acquired a pair of Titan Xp Star Wars Edition, in order to realize soon a SLI (with a Core i9 7900x) …
Titan v vs. 2-way SLI Titan Xp: what would it give? Tests soon expected ?
Give headache
Give headache
For reals, you measure with
For reals, you measure with fraps and can’t even get the specs for the Vega right.
I trust these results.
5960X and X99 are pretty
5960X and X99 are pretty dated platform, hopefully we see some updates results with 8700K and OC as these results look like they are seeing a CPU/platform bottleneck.
Based on these results I
Based on these results I don’t think we will see any mainstream gaming Volta cards. They made a killing selling a tiny Pascal 300mm die chip as a high-end part due to lack of competition. A 300mm Volta card would only be marginally faster than the 1080 and not worth upgrading for most people. They need a 300m part that is 25-30% faster than the 1080 to maintain their huge margins, that chip will require a brand new architecture and a move to 10nm or 7nm.
388.71 are here and now
388.71 are here and now support Titan V officialy !
Where the 388.51 doesn’t.
Will the min framerate be better?