A Summary Thus Far
The issues surrounding the GeForce GTX 970 Memory system and performance are complicated – but we attempt to run some tests and make some assertions.
UPDATE 2/2/15: We have another story up that compares the GTX 980 and GTX 970 in SLI as well.
It has certainly been an interesting week for NVIDIA. It started with the release of the new GeForce GTX 960, a $199 graphics card that brought the latest iteration of Maxwell's architecture to a lower price point, competing with the Radeon R9 280 and R9 285 products. But then the proverbial stuff hit the fan with a memory issue on the GeForce GTX 970, the best selling graphics card of the second half of 2014. NVIDIA responded to the online community on Saturday morning but that was quickly followed up with a more detailed expose on the GTX 970 memory hierarchy, which included a couple of important revisions to the specifications of the GTX 970 as well.
At the heart of all this technical debate is a performance question: does the GTX 970 suffer from lower performance because of of the 3.5GB/0.5GB memory partitioning configuration? Many forum members and PC enthusiasts have been debating this for weeks with many coming away with an emphatic yes.
The newly discovered memory system of the GeForce GTX 970
Yesterday I spent the majority of my day trying to figure out a way to validate or invalidate these types of performance claims. As it turns out, finding specific game scenarios that will consistently hit targeted memory usage levels isn't as easy as it might first sound and simple things like the order of start up can vary that as well (and settings change orders). Using Battlefield 4 and Call of Duty: Advanced Warfare though, I think I have presented a couple of examples that demonstrate the issue at hand.
Performance testing is a complicated story. Lots of users have attempted to measure performance on their own setup, looking for combinations of game settings that sit below the 3.5GB threshold and those that cross above it, into the slower 500MB portion. The issue for many of these tests is that they lack access to both a GTX 970 and a GTX 980 to really compare performance degradation between cards. That's the real comparison to make – the GTX 980 does not separate its 4GB into different memory pools. If it has performance drops in the same way as the GTX 970 then we can wager the memory architecture of the GTX 970 is not to blame. If the two cards perform differently enough, beyond the expected performance delta between two cards running at different clock speeds and with different CUDA core counts, then we have to question the decisions that NVIDIA made.
There has also been concern over the frame rate consistency of the GTX 970. Our readers are already aware of how deceptive an average frame rate alone can be, and why looking at frame times and frame time consistency is so much more important to guaranteeing a good user experience. Our Frame Rating method of GPU testing has been in place since early 2013 and it tests exactly that – looking for consistent frame times that result in a smooth animation and improved gaming experience.
Users at reddit.com have been doing a lot of subjective testing
We will be applying Frame Rating to our testing today of the GTX 970 and its memory issues – does the division of memory pools introduce additional stutter into game play? Let's take a look at a couple of examples.
Thanks for all the analysis.
Thanks for all the analysis. This is exactly what everyone was waiting for. Now all the cards are on the table and people can decide how they feel about the memory division.
I second that. I hope the
I second that. I hope the vocal people on both sides can calm down now, but I’m probably hoping for too much.
Then again, their bickering doesn’t mean as much as facts.
Yes like the fact nvidia lied
Yes like the fact nvidia lied about specs.
This article is more of pcper shilling for nvidia. Pcper is an advertisement for nvidia.
Shill site up in here.
seemed like a pretty fair
seemed like a pretty fair article to me, 970 was stuttering more in some circumstances and he reported that. The 970 sli testing will be the really interesting article though.
do you recall the 1st article
do you recall the 1st article where it was “a philosophical debate” ??
ha!
Because it is?
Even after
Because it is?
Even after this article, some will say the (small) difference between the GTX 980 and GTX 970 performance degradation results @ ridiculous settings and resolutions for a single GPU are more due to VRAM but, many will say that difference has to do more with Cuda cores / SMM than anything else.
I’m not convinced it’s the VRAM either… My personal experience with the card tells me otherwise.
I’m only disapointed because the lack of SLi results. I get it will introduce more variables but, somehow would delude the Cuda Cores impact, revealing more the VRAM influence, imo
SLI testing is needed to
SLI testing is needed to conclude whether it is the missing SMTs or the Ram issue… as someone with multiple 970s… I am eager to see those results.
I think the article was beyond fair. Yes they see something is amiss but need more data to pin it down. Reproduced but not root caused.
if you have multiple 970s,
if you have multiple 970s, why not just crank up DSR yourself and see what happens?
Where are the SLI bechmarks,
Where are the SLI bechmarks, that’s were we will see the limitation,
Idiot…
Idiot…
Ryan’s article seems pretty
Ryan’s article seems pretty fair to me. This frame buffer split does cause issues to quite a few games and maybe more games in the future. But there is no solid evidence to prove this is the culprit for all the performance degradation compared to 980.
If you absolutely must see an article saying “YES, the 3.5/0.5 VRAM config is the root of all the stutters on a 970” so it makes you feel better, by all means start a new hardware review site and write one up yourself.
How is it fair, when we still
How is it fair, when we still are not seeing the SLI results: this is where the limitation will show up
If you think PCper is
If you think PCper is shilling for Nvidia, then you must believe every hardware site in existence is because they are all saying pretty much the same thing. Which begs the question – why are you reading any of them?
Penteract: That is obvious to
Penteract: That is obvious to me, they are looking for vindication for their preconceived notions!
i’m pretty sure this isn’t
i’m pretty sure this isn’t about the last 0.5gb its more about the principle, it feels like Nvidia was just trying to keep it a secret. 99% sure their wouldn’t be a problem if the card had 3.5gb or they told us about the slower last 0.5gb and said its performance is decrase was negligible.
Thx for the post.
Even though
Thx for the post.
Even though I know there microstutter is a thing introduced with SLI at times, we have agood catalogue of games that very rarely induce microstuter as evidenced by Pcpers Fcat tests.
I still think an SLI test with the 970 at similar settings as above would be interesting (and with 980s for the true comparison). I still get the naggling feeling the inadvertently affects SLI users more since they go for higher res and higher effects and can usually afford it as well.
Did a little testing of my
Did a little testing of my own using afterburner’s frametime readings and other monitoring tools… it’s not FCAT but it’s very accurate regardless. Here’s what I got…
http://i.imgur.com/PHaofek.png
So yeah, using SLI GTX 970’s to drive high-res high-settings will result in massive, massive frametime issues, even if the framerate over a given second remains reasonable. It is basically an unplayable mess at that point when using 3.7-4.0gb of VRAM. If you can stay around/below 3.5gb of actual usage, which it does its best to do, frametimes are consistent and tight as you would expect. The framerate averaged around 38, meaning in a perfect world the frametimes would be right around 26.3ms for each frame.
As an interesting aside, when finding my settings to test with I noticed it would literally, over the course of several seconds, try to work its way back down to below 3.5gb of usage if it went over, until I set things high enough that it couldn’t and would just stick at 3.7-3.8gb+ the whole time. Otherwise it would fight and keep pingponging from ~3.4gb directly to ~3.7gb and back repeatedly before finally settling at ~3.4gb. That’s probably the drivers at work, there.
I have a distinct feeling that this affects SLI much more severely than single card setups even, after seeing this article (which even on single-card is showing a negative effect).
Did some more testing using
Did some more testing using Shadow of Mordor, SLI enabled and disabled, sub-3500mb and over 3500mb VRAM consumption. Frametimes stay within normal variation/acceptable consistence when in single-card mode regardless of VRAM, but going to over 3500mb in SLI causes wild and rampant stutters/hitches with vastly fluctuating frametimes to match.
http://i.imgur.com/mhOevfQ.png
Very interesting thanks for
Very interesting thanks for taking the time to run some tests. The SoM results look horrific.
I’d like to see a site test this too as it’s clearly a topic the public has great interest in. NVidia has admitted that they misinformed journalists and the public – quite possibly with the intent of hiding such results, it deserves a proper and thorough investigation.
Yup that’s what I thought,
Yup that’s what I thought, sli is where the issue is seen: SLI is a feature of the gpu
AMD responds to NVidia’s
AMD responds to NVidia’s BS:
https://twitter.com/Thracks/status/560511204951855104
How about AMD’s “up to 1GHz”
How about AMD’s “up to 1GHz” 290X cards with reference cooler than would never maintain that speed for more than a few minutes (Conveniently just enough time for a benchmark to run…)?
Or the completely broken Crossfire implementation that AMD sold for many years. Literally anyone that purchased anything pre HD 5000 series never saw any benefit at all from that multi-GPU setup. 5000, 6000 cards eventually saw the fix – but only several years after their introduction.
I doubt that AMD is really in much of a position to be claiming the high moral ground here. Hah.
The media instantly attacked
The media instantly attacked AMD for that, not giving time to AMD to answer.
At the same time they acted as a marketing department and press office for Nvidia in the case of 970, posting everything Nvidia was saying and trying to downplay the problem.
In the case of AMD performance was secondary compared to the clock speed. You are doing it here also.
In the case of 970 performance was mentioned all the time to cover up the fact that half the specs on 970 where false.
And last, in the case of AMD with good cooling you get that 1GHz frequency. In the case of 970 you don’t get back the 8 ROPs, you don’t get back the 256K cache, you don’t see memory bandwidth going up to 224GB/sec you don’t unify the memory.
Do you understand the difference?
On the other hand, amd’s r9
On the other hand, amd’s r9 launch caused many reviewers to report performance numbers that were actually false. Reviewers would launch up the card, run their evaluations, and then report the numbers. Many failed to let the card warm up, at which point the card would clock down due to overheating and cause worse performance.
I can definitly see how both of these situations caused consumer disatisfaction. Personally, I prefer Nvidia’s blunder because the performance metrics haven’t changed, every performance evaluation of 970 before this technical discovery is still valid. With amd r9 , however, every review had to be taken with a grain of salt.
Like you say, though, with aftermarket cooling the problem with the r9 dissappeared.
You guys have to take a look
You guys have to take a look on older generations of cards with segmented memory like gtx660 – wich is a good portion of the games.
Have a look at the VRAM usage in the beginning – 1950mb – and then from 40sec onward. And the stutters introduced after the change.
https://www.youtube.com/watch?v=m_JxKWbfVdE
The Nai’s results on those cards are also very similar to the GTX970s.
hey pcper two
hey pcper two things.
first:
what about if we have a situation where in an open world game at 4k res, stuff gets constantly fast unloaded and uploaded into the vram. is this a situation? if this is something the engine depends on, non static.
was just thinking.
the second thing:
why on earth do we get in germany this:
https://www.youtube.com/watch?v=TVoeLy9lCeI
“Live Streaming is not available in your country due to rights issues.
Sorry about that.”
This is a good point, games
This is a good point, games which have more unique assets and more objects in general will be swapping VRAM more often. Something like the latest dying light, which also seems to have pretty high VRAM usage.
Or Ryse even with how ridiculously high fidellity everything is in the game. I defeniitely think it needs more per game and configuration testing (SLI 980 vs SLI 970)
I’m not sure about the live
I'm not sure about the live streaming issue…that's very odd.
Can you go to twitch.tv/pcper ?
Dear Ryan,
Could you run the
Dear Ryan,
Could you run the test by renaming the game executable?
Eg : from CODAW.exe to ABCD2.exe
I wonder if there will be high variance if there isn’t any driver optimization being used for the game.
Don’t a lot of games have
Don’t a lot of games have game specific optimizations?
Live streaming on YouTube is
Live streaming on YouTube is disabled in Germany, because YouTube doesn’t care about purchasing music licences from GEMA. As live streams cannot be analyzed by their software filter for GEMA licenced music, YouTube chose this solution.
And testing memory
And testing memory segmentation on a 2gb card with the current games would be way easier.
No surprises here. Running
No surprises here. Running the card in unrealistic gaming scenarios can cause stuttering, but your frame-rate will already be so low that it is not going to matter whether you have a GTX 970 or GTX 980.
If you have SLI, your
If you have SLI, your framerate wouldn’t be so unplayably low and you would be pretty poorly affected (ceteris paribus of course).
it is not unrealistic. Especially with the idea that games will include higher res assets and textures over the time of this generation…
Perhaps. But do you really
Perhaps. But do you really want to game at between 20-40FPS even without any stuttering?
Gysnc, 30fps locked, 40fps
Gysnc, 30fps locked, 40fps locked, rts game, slow paced 3rd person game.
Yes. There is much to be argued for testing that config I think.
True, if you have GSync and
True, if you have GSync and it’s not a fast paced game and you want to game at 4K.
But seriously, the 970 was never designed for such astronomical settings. If you are really looking to game at that level then the AMD cards are a better fit for the price or if you have the cash, GTX 980s.
same as what I’ve been
same as what I’ve been thinking this whole time. That’s probably why this issue has been in the dark so long, no one is running on settings high enough to push over 3.5gb. Someone has had a play and found a problem… playing on settings that are too hight for the card makes it unplayable! Well, it would be unplayable anyway! lol
If you’re thinking of doing 4k or triple display, you should be thinking of 980s that will do the job properly.
Instead of turning up
Instead of turning up resolution, AA, or presets to try to hit the 512MB, vary the texture settings.
You’ll see this gets you into the 512MB.
Start with settings turned up where you want them to get near 3.3-3.4GB usage, but with a lower level of textures than max.
Then turn up textures to max. You’ll see you start using that space.
I disagree, scaling
I disagree, scaling resolution seems like the most normalized variable to increase VRAM usage, with all other things being equal. It makes sense to scale with resolution and keep everything else the same.
My question: How does DSR replicate real 4k results? Is there any other error introduced via the DSR technique that is not totally recapitulated in the FCAT analysis? I don’t know enough about the technology to give an opinion.
DSR will render to 4K in
DSR will render to 4K in exactly the same way as 4K, but will add an extra pass on that output to perform the gaussian filtering stage to get the final output image. The main difference between this and rendering to 4K for 4K output is that DSR will add an extra small fixed amount of latency (fixed because you always filter the entire frame, every frame), and will add a slight extra fixed load to the GPU to perform the scaling (Maxwell 2 has some extra dedicated hardware to accelerate this).
Very nice tests, thanks for
Very nice tests, thanks for taking the time to do that! For the most part I thought it was quite impartial.
What I find most interesting:
In the BF4 test, the jump happened at the 1.3x DSR for the GTX 970 while things remained mostly linear for the GTX 980. According to the first graph, the 1.3x DSR setting was the first setting to use >3.5Gb VRAM, which coincides perfectly with the partition. Very interesting indeed.
It’s not very obvious
It’s not very obvious though.
The article says: “At 130% that number climbs to 3.58GB (already reaching into both pools of memory)”. The first memory segment should end at 3,584 MB, so did anything spill over into the second segment?
If there is a huge speed cut,
If there is a huge speed cut, you may see an artificial plateau at 3.584GB and then the card has to really struggle to fill after that.
Does that make logical sense? I don’t understand very well myself, but it seems like the would be a plausible explanation of why it stops so closely at that number.
It’s over-blown from a
It’s over-blown from a technical point of view- very few people in very few situations will hit that 3.5+ GB. But from an ethical point of view, it is completely valid to get pissed about this if you have a 970. It’s sleazy.
A 4 GB card would not have sold as well as a 3.5 GB card which is why that 0.5 GB was included. I am not convinced that the marketing team didn’t know about the ROPs and L2 cache (sorry, Jeremy- CAYCHE).
Nvidia should be held LEGALLY responsible for false claims (whatever the reason) on the side of the box. There should be an investigation into Nvidia business practices and false claims. If I bought a car that said it’s an 8 cylinder with 400 HP and when I took it home found that I actually bought a 6 cylinder with 300 HP and the car company said that the marketing team didn’t know what a cylinder really was…. no one would care, heads will roll whether I ever actually use those extra 100 ponies or not. I might drive like a granny but I damn sure better get what I paid for.
I’m not sure that nvidia had
I’m not sure that nvidia had specified ROPs and L2 cache in any of their ad copy or marketing materials, so I don’t know that there’s a case for false advertising there. These kinds of specs aren’t listed on their website and I’m not sure if they ever made it to the box either. I understand that they went out in technical docs to reviewers.
Regarding your car analogy, the manufacturers do all kinds of tricks to hit certain numbers all the time. They’ll go ahead and put 5W20 oil in there or one-off special tires to get that MPG rating that they post, but you’re not going to see that kind of thing in normal use. My own experience (which was common across the fleet of late 90s Fords) required replacing a defective intake design which knocked 15 hp off the car. This kind of stuff happens all the time in the auto industry.
I believe there web site had
I believe there web site had the specs there as well.
It would have been diffucult
It would have been diffucult to not include the “last” 512 MB. You don’t know which memory controller will have a defect, so you would need 8 different boards, all with a different memory chip missing. If you are going to have to put it on, you should make use of it. The problem is that they didn’t communicate the actual specifications.
Have to love the closing
Have to love the closing thoughts
I highlighted Nvidias apology, sorry your opinion.
After just 2 games and being inconclusive in those 2. Not even testing SLI or different genre games you came to that assessment.
I don’t see how you can foresee no game in the future having an issue even those gamework titles recommending 4GB.
Games with these options or
FOV options
PhysX enabled
RPG
DLC Texture Packs
SLI
Multi-monitor
This isn’t a examination at all.
I suppose that you have made
I suppose that you have made similar comments at Guru3D and Hardware Canucks, or are you just trolling PCPer?
No, apparently you have.
No, apparently you have.
I think the difference in
I think the difference in memory use has something to do with the driver/game. Cod is a nvidia game and I bet they have tweaks in the driver for the 970 with that game. Bf being a amd game maybe doesn’t have the same level of driver optimization for the 970.
I do agree with the overall point of if you’re getting 25 or less fps, you’re not going to be playing at those settings. But that brings sli into the picture. If you’re sliing 2 970 you might be able to get playable fps at those setting. With sli stutter plus this stutter, things could get bad quick.
This does make me change my opinion of this card. It still is a really good card for the money, but if you play to sli 2, maybe not.
Good review
Great article! Thanks.
Great article! Thanks.
Just don’t question Nvidia
Just don’t question Nvidia please, everyone knows they would never lie.
The GTX 970 essentially has 4.5GB, the 4GB can be used and additionally the 0.5GB can be used as system memory, essentially you are buying more.
We should be thankful that we had the GTX Titan selling at $1000 for over 6 months, then the GTX 780TI selling at $800 for additional 6 months, until Nvidia gave us the same performance for $550, isn’t that great?
No, the card really only got
No, the card really only got 3.5GB and 500MB on the side for which is a total of 4GB.. Not 4.5GB.
I still don’t understand why
I still don’t understand why everyone is only testing single 970 cards at higher resolutions and then claiming no one would play at those levels anyway so no harm no foul.
No sh*t, the point is that many people found the 970s so cheap that they wanted to get TWO of them to use in sli with a 4k display SPECIFICALLY for the purpose of higher resolution gaming.
If sli introduces higher background frame time numbers so be it.
We can still compare 980 sli frame times before 3.5 GB and up to 4, and then 970 from before 3.5 GB and after and see what the differences are. THAT is the prime use case where people actually chose 970s to sli in, not the single gpu. So why is no one testing that?
Because the “it’s not an issue to worry about” argument goes away?
The real story here is
The real story here is actually the reduce memory bit depth from 256 to 224, and the fact that the 970 can only access one pool at a time and not both. Thus the maximum memory bit depth can only ever be 224 or 32.
Please incorporate this information into your 970 articles going forward because this is the biggest deception to come from this fiasco by far.
Perhaps but that’s all
Perhaps but that’s all reflected in the benchmark performance numbers anyway.
Very little game ATM truly
Very little game ATM truly leverage >3GB
Its mostly cached texture.
When future game comes out that have a higher res texture set, this problem might become more visible.
So there is no point in keeping the lie. Spec should be accurate.
The GTX970 is a card with a split 224bit / 32bit bus
and only one bus can be used at a time, making the total truely usable memory 3.5GB not 4GB..
Again: 224bit, 3.5GB.. not 256bit 4GB
This will impact game optimizations…
The GTX970 is a card with a
The GTX970 is a card with a split 224bit / 32bit bus
and only one bus can be used at a time, making the total truely usable memory 3.5GB not 4GB..
Again: 224bit, 3.5GB.. not 256bit 4GB
—————————-
Lines above are LIE
TRUE IS:
256 bit for 3,5 GB
32 bit for last 0,5 GB
but you have more ROPs that makes 970 good as it is
No, the full 256 bit are not
No, the full 256 bit are not available for the 3.5 GB partition. Just look at the location of the memory controllers and count the number:
https://pcper.com/files/imagecache/article_min_width/review/2015-01-27/GM204_arch.jpg
This is the point.
On a GTX
This is the point.
On a GTX 970 with under 3.5GB VRAM usage you have a 224b 196GB/s memory subsystem not the 256b 224GB/s one that was marketed, bad enough marketing but seems ok in actual usage.
On a GTX 970 using over 3.5GB of VRAM, bandwidth is worse because the card can only use the 224b or the 32b segement at any instant so the effective memory bandwith is *less than* 196GB/s, with the drop off dependant on the fraction of VRAM accesses to the slower chunk.
If assets were spread evenly across the whole 4GB and each bit had the same access pattern, the effective memory bandwidth would go down to just under 172GB/s.
Obviously it’s in Nvidia’s interests to try and optimize around this in their driver, so IRL it may not be a big deal for gamers.
OTOH, is it possible to code an application to put all the assests in the slow VRAM segment? If so, that has only 28GB/s bandwidth so the performance could be diabolical 😉
If 970 can’t get past 3.5GB
If 970 can’t get past 3.5GB in COD but 980 can, would it be because of the softwares (the game and the driver)? Now the game could detect and prevent a specific card to avoid slow down in performance, but only if they are in the knowledge about it. The driver also able to do it. Either way, if that’s the case then it means Nvidia knew about this already, the disabled ROP hits performance.
They WILL address it with a
They WILL address it with a new driver. So, yeah…
Apparently PeterS@nvidia has
Apparently PeterS@nvidia has redacted his statement, they are no longer working on a optimizing a specific driver to address the 970 memory issues:
https://forums.geforce.com/default/topic/803518/geforce-900-series/gtx-970-3-5gb-vram-issue/219/
Interesting. Although I
Interesting. Although I suspect best game for this type of testing would b e Watch_dogs, because IIRC that games is never really shader bound but VRAM/bandwidth.
Wonder how would it look on G-Sync monitor…
I don’t think the conclusion
I don’t think the conclusion “doesn’t matter, because this framerates are unplayable and nobody would choose this settings” is a good one.
It is possible to find a variety of scenarios in which the framerate is ok and still stuttering could occure due to more than 3.5 GiB VRAM being in use. In the end the GTX 970 is a high end card, something that non-PC-enthusiasts will barely buy.
While I do appreciate the
While I do appreciate the article and the amount of work that went into researching this. I feel it’s only half done. What you’re trying to find out is whether or not the two memory pools of the GTX 970 has a noticeable frame variance impact in games.
So as others have stated, why aren’t you looking into specific edge cases that address this specific problem? While I do know finding a good testing scenario (like what game with which settings in what part of the game) are really time consuming. And I would like to also acknowledge the fact that these scenarios are not in anyway representative of how the average gamer will utilize their GTX 970.
So my ideal scenario would be the following:
Try to increase the VRAM above 3.5GB while keeping the performance at 60fps, and keep the resolution at 1440p or even 1080p if possible. This way any sort of frame variance is more likely to show up in the FCAT graphs.
When such a test scenario is found, I’d like to see this tested also with two GTX 970 in SLI. Since, this would be the best bang for your buck setup to high end gaming.
Unfortunately I cannot contribute to this in any other way (I’m still using GTX 580 3GB in SLI). I hope to see a followup of this article, and going (even) more in depth of this issue.
I would like to add the
I would like to add the following: I think comparing against a GTX 980 is also not worth it, and is only making things more complicated. What you know is that when using < 3.5GB the GTX 970 should be able to use the maximum possible memory bandwidth. So it would be good to use that to create a test scenario to get an actual apples to apples comparison.