UPDATE 1/28/15 @ 10:25am ET: NVIDIA has posted in its official GeForce.com forums that they are working on a driver update to help alleviate memory performance issues in the GTX 970 and that they will "help out" those users looking to get a refund or exchange.
UPDATE 1/26/25 @ 1:00pm ET: We have posted a much more detailed analysis and look at the GTX 970 memory system and what is causing the unusual memory divisions. Check it out right here!
UPDATE 1/26/15 @ 12:10am ET: I now have a lot more information on the technical details of the architecture that cause this issue and more information from NVIDIA to explain it. I spoke with SVP of GPU Engineering Jonah Alben on Sunday night to really dive into the quesitons everyone had. Expect an update here on this page at 10am PT / 1pm ET or so. Bookmark and check back!
UPDATE 1/24/15 @ 11:25pm ET: Apparently there is some concern online that the statement below is not legitimate. I can assure you that the information did come from NVIDIA, though is not attributal to any specific person – the message was sent through a couple of different PR people and is the result of meetings and multiple NVIDIA employee's input. It is really a message from the company, not any one individual. I have had several 10-20 minute phone calls with NVIDIA about this issue and this statement on Saturday alone, so I know that the information wasn't from a spoofed email, etc. Also, this statement was posted by an employee moderator on the GeForce.com forums about 6 hours ago, further proving that the statement is directly from NVIDIA. I hope this clears up any concerns around the validity of the below information!
Over the past couple of weeks users of GeForce GTX 970 cards have noticed and started researching a problem with memory allocation in memory-heavy gaming. Essentially, gamers noticed that the GTX 970 with its 4GB of system memory was only ever accessing 3.5GB of that memory. When it did attempt to access the final 500MB of memory, performance seemed to drop dramatically. What started as simply a forum discussion blew up into news that was being reported at tech and gaming sites across the web.
Image source: Lazygamer.net
NVIDIA has finally responded to the widespread online complaints about GeForce GTX 970 cards only utilizing 3.5GB of their 4GB frame buffer. From the horse's mouth:
The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.
We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment. The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.
Here’s an example of some performance data:
GTX 980 | GTX 970 | |
---|---|---|
Shadow of Mordor | ||
<3.5GB setting = 2688x1512 Very High | 72 FPS | 60 FPS |
>3.5GB setting = 3456x1944 | 55 FPS (-24%) | 45 FPS (-25%) |
Battlefield 4 | ||
<3.5GB setting = 3840x2160 2xMSAA | 36 FPS | 30 FPS |
>3.5GB setting = 3840x2160 135% res | 19 FPS (-47%) | 15 FPS (-50%) |
Call of Duty: Advanced Warfare | ||
<3.5GB setting = 3840x2160 FSMAA T2x, Supersampling off | 82 FPS | 71 FPS |
>3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on | 48 FPS (-41%) | 40 FPS (-44%) |
On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.
So it would appear that the severing of a trio of SMMs to make the GTX 970 different than the GTX 980 was the root cause of the issue. I'm not sure if this something that we have seen before with NVIDIA GPUs that are cut down in the same way, but I have asked for clarification from NVIDIA on that. The ratios fit: 500MB is 1/8th of the 4GB total memory capacity and 2 SMMs is 1/8th of the total SMM count. (Edit: The ratios in fact do NOT match up...odd.)
The full GM204 GPU that is the root cause of this memory issue.
Another theory presented itself as well: is this possibly the reason we do not have a GTX 960 Ti yet? If the patterns were followed from previous generations a GTX 960 Ti would be a GM204 GPU with fewer cores enabled and additional SMs disconnected to enable a lower price point. If this memory issue were to be even more substantial, creating larger differentiated "pools" of memory, then it could be an issue for performance or driver development. To be clear, we are just guessing on this one and that could be something that would not occur at all. Again, I've asked NVIDIA for some technical clarification.
Requests for information aside, we may never know for sure if this is a bug with the GM204 ASIC or predetermined characteristic of design.
The questions remains: does NVIDIA's response appease GTX 970 owners? After all, this memory concern is really just a part of a GPU's story and thus performance testing and analysis already incorporates it essentially. Some users will still likely make a claim of a "bait and switch" but do the benchmarks above, as well as our own results at 4K, make it a less significant issue?
Our own Josh Walrath offers this analysis:
A few days ago when we were presented with evidence of the 970 not fully utilizing all 4 GB of memory, I theorized that it had to do with the reduction of SMM units. It makes sense from an efficiency standpoint to perhaps "hard code" memory addresses for each SMM. The thought behind that would be that 4 GB of memory is a huge amount of a video card, and the potential performance gains of a more flexible system would be pretty minimal.
I believe that the memory controller is working as intended and not a bug. When designing a large GPU, there will invariably be compromises made. From all indications NVIDIA decided to save time, die size, and power by simplifying the memory controller and crossbar setup. These things have a direct impact on time to market and power efficiency. NVIDIA probably figured that a couple percentage of performance lost was outweighed by the added complexity, power consumption, and engineering resources that it would have taken to gain those few percentage points back.
You know the REAL
You know the REAL solution?
First of all, the BIOS decides whether to access the other segment or not. I verified this by running a quadro driver which should absolutely have no hidden information on the GTX970, and it still gets stuck at 3.5gb.
FIX 1)
Release a BIOS Update or modify the driver to allow both segments to be used, no matter what. Most games like next-gen ports are using vram for texture data and I’m pretty sure the last 0.5gb chunk is FASTER than system RAM.
FIX 2)
Let the Desktop DWM/Aero use the last 0.5gb chunk, and the games have FULL access to the high-speed 3.5gb chunk. This way we would only get stuck at the 3750MB~ mark. Kind of like how laptops with optimus work. This by far will be the best solution.
FIX 1 I tested with ACU on the quadro driver. Putting to 4xmsaa at 1080p with softshadows overloaded the vram to 4000mb+ and it stayed there when I dropped the MSAA and soft shadows. The spikes in frametimes were pretty much the same, and the performance average was 5% bad. We’re clocking 970s to 1550mhz+ so that much loss is okay for using the 0.5segment at a loss.
I’m sure neither of these will happen, because nvidia is too arrogant. FIX 1 as a bios update might come later, as a hack which forces both segments of the 970 to be used. No reviewer will critize this. Get refunds while you can, and get the GTX780 6GB.
It would have been smart for
It would have been smart for Nvidia to lock off the last .5GB just to simplify things for themselves and avoid this whole mess.
That being said, no one buying a 970 should assume that a 25% disabled 28nm planar GPU which relies on memory compression to barely outperform GK110 is going to be good for 4K.
Also, any GM204 will be getting low FPS if you are coming close to 4GB of memory utilization anyway.
The whole GPU industry is going to be boring(including Titan 2 and 390x) until 16nm FinFET GPUs with HMC on a silicon interposer is here in 2016 with GP100 and 14nm FinFET with Knights Landing.
GPUs havent been interesting since GK110 came out and that design was so good that Nvidia even made GK110b and GK210.
Nvidia official response:
Nvidia official response: “Buy a console you effing nerds holy ff shit it doesn’t matter. Get a freaking life”
LOL I’m laughing at you
LOL I’m laughing at you nvidia buyers fucked LOL
Oh dear, fanboy alert. Bet
Oh dear, fanboy alert. Bet you’re right up there with all the other zit-faced teenagers having no clue about what a computer is other than “the machine that runs Battlefield”.
As if all the REAL-WORLD 970X benchmarks since the initial release became meaningless overnight. Well I’ve got good news and bad news for you, son. The bad news is you’re out of touch with reality and don’t realize how trivial and unsubstantiated your comment is. The good news is that it’s not too late to restore the relationship with your parents, on whom you so heavily depend when it comes to funding your flawlessly running, albeit heavily dated gaming rig.
I’m sorry if my rant causes irreparable damage to your sense of self-esteem, but to be very upfront about this: you had it coming.
As much as you laugh at AMD
As much as you laugh at AMD uses with crossfire?
8 core CPUs with 200W TDPs that are easily outperformed by an Intel 4 core?
512bit bus GPUs that are easily outperformed by 256bit Nvidia ones?
Lol
let me add more
6 Gb Vram GPU
let me add more
6 Gb Vram GPU beaten by 4 GB Vram GPU
7.1 B transistors GPU beaten by 6.2 B transistors GPU
USD 1000 GPU beaten by USD 550 GPU
the list goes on …
let me add more
6 Gb Vram GPU
let me add more
6 Gb Vram GPU beaten by 4 GB Vram GPU
7.1 B transistors GPU beaten by 6.2 B transistors GPU
USD 1000 GPU beaten by USD 550 GPU
the list goes on …
/Disclaimer “I personally
/Disclaimer “I personally lean more to NVIDIA in my personal machines”
That said…
I think its funny when AMD and NVIDIA “fanboys” or whatever come out and point fingers when this kind of thing happens. Something to keep in mind, making GPU’s/Videocard’s is complex and difficult, inherently mistakes will be made from time to time, and it happens to all chip makers.
But I do think it is noteworthy to mention that NVIDIA seems to have come out mostly on their own and relatively calm about this issue, and not hide it and then tell review sites “don’t mention that”.
In any case, there is good reason for PCPer to look into this, which I have no doubt Ryan and the team are going to. I look forward to reading the investigation/review.
+1
On a side note, I’ve
+1
On a side note, I’ve grounded Anonymous until further notice. It’s hard these days teaching kids the principles of respectful behaviour.
Well, shit, it’s c plex and
Well, shit, it’s c plex and super duper hard, so I guess I’m just out my eight hundred bucks for your mistake…
This is sad on NV’s part but
This is sad on NV’s part but let’s be real here, most people would still have bought this card at the same price even if NV claimed 3.5G ram instead of 4G.
Doesn’t make it right what they did but, my point still stands. Now all you users that think your 970 now sucks all of a sudden please dump it on the used market at half the cost so I can buy another for SLI, kthx.
Damage already done to both
Damage already done to both the GTX970 owners and AMD sales.
The slower section of memory
The slower section of memory would obviously not cause stutter. Stutter is frame to frame variance. Slower access to memory would just slow every frame equally. You read that same data every frame. NVIDIA is placing the lowest priority resources in this section of memory, and as a result the frametime hit overall is only 1-2%. Even then it’s unclear that the performance difference NVIDIA gave as an example was attributable specifically to the memory structure difference. It falls into the noise.
So you should be looking for how much slower overall performance is (relative to 980) in a bunch of workloads between 3.5GB and 3.8GB, not stutter.
Also, as you approach 4GB, you might start to get stutter as Windows shuffles memory around to ensure an allocation never fails and everything is placed optimally based on usage by the application, but this should be the same stutter you see for *every* GPU, including those with a standard memory configuration, like the 980. Compare all results to the 980 so you don’t blindly misattribute anything that goes wrong to this memory layout on the 970.
Also, to correct misinformation, you do get the full 4GB, and the last 512MB is still much faster than system memory.
The Nai benchmark is flawed for purposes of testing this separate memory segment. It is clearly spilling to system memory above 3.5GB, so all that it’s measuring is PCIE bandwidth, not bandwidth of the last 512MB of memory. For whatever reason, CUDA allocations specifically are ignoring this separate segment of memory, and that is likely a driver bug that is easily fixable. Therefore this test does not reflect the memory and performance games see/get when creating resources… They get the full 4GB.
Disabling the 512MB would would be worse for performance, since then you’re guaranteed to spill to system memory over 3.5GB, which is ass slow. If you did that, then you *would* get exactly the performance seen in Nai’s benchmark.
One last thing… Allocating video memory and using video memory are treated differently in Windows. You won’t start to use the last 512MB until your total usage in a single buffer of work exceeds 3.5GB…In other words, when it’s needed. This means you won’t be taxed whatsoever if a game doesn’t reference over 3.5GB in a buffer, and any performance tax above the 3.5GB is negligible, since it only contains low priority resources anyway. Over a whole frame its likely in the low single digit percentage range.
Please do not confuse people
Please do not confuse people with how things actually work!
Good post.
They need to cut out that
They need to cut out that 500mb and post 3.5gig on the gtx970 and call it a day. Or, magically Fix it with new drivers that put’s all 3rd party (desktop stuff ) on that 500mb and use the full 3.5gigs for gaming … If not ? Aero and background stuff running it will use 250mb-500mb of high speed memory witch lowers your ram to 3gigs for gaming .. In 1440p You need all the RAM you can get..
I saw on another forum that
I saw on another forum that someone used this ‘benchmark’ on their GTX 570 and it reported that it had a bandwidth of like 3.2Petabytes/second. And that the 980 had the same issue. So Im still calling BS on this whole thing. I have yet to see any issue with my 970 and I run at 4k on everything I play.
The bench from Nai was
The bench from Nai was incorrect at the intended purpose(after 3500 at least)but was good for 1 thing, it showed that it couldn’t access that final 512…..and folks that are running benches are finding issues >3500- stutters, GPU usage drops, etc…
The initial findings may have been incorrect, but it still showed a flaw.
Here, from Nai himself:
My
Here, from Nai himself:
That means nothing, and has
That means nothing, and has been discussed quite a bit already,as that came out awhile ago, yet testing has confirmed issues at >3500.
Great, his tool errored…why the issues?
Jen-Hsun Huang’s new Nvidia
Jen-Hsun Huang’s new Nvidia math 3.5 = 4, and you will pay more for those 4.
Edited so as to not offend
Edited so as to not offend the the hands that feed! One must never totally trust any website that gets revenue from any companies’ products its reviews.
This problem is just an example of the bottom of the bin scrapings for more green, in the green in J. H. Huang’s fat wallet. Fresh from the cutout bin to you!
I just spent $800 on dual
I just spent $800 on dual few+ 970s that do about 62 percent better than my dual 670s, even on a faster cpu (4790k vs 3770k). Review sites claimed second card would do nearly 100 percent improvement over single. I find 63 percent in testing.
Now this.
After fifteen years and many thousands of dollars and getting screwed out of $800 here, my next purchase will be AMD.
Enjoy multi-GPU thats worse
Enjoy multi-GPU thats worse than a single card and ridiculous TDPs and retail cards that dont perform like press samples.
970SLI being ~ 62% better
970SLI being ~ 62% better than 670SLI sounds about right given current drivers. Anandtech’s /bench webpage shows a ~ 60-75% gap for most games for a single card between 970 and 670…
A second card in SLI will never give 100% over single except for synthetic or compute type items, and even then something else is likely to reduce that to <100%. Lately Nvidia scaling hasn't been as good as AMD's -- 63% sounds like a reasonable gain (although maybe a little low) for adding a second card, depending on the resolution and everything.
I love my 970 SLI (quieter than my single 7970 they replaced) - although I wish Nvidia had VR-SLI up and running like they claimed at launch..
Doesnt matter to compare the
Doesnt matter to compare the 970 vs the 970 and do math.
If The 970 is a 3.5 + 0.5 card then it is NOT a 4 GB card as marketed. But rather like a dual sli gpu one of 3.5 and the second of 0.5.
By Marketing an item giving one specification while having a different one to the ignorance of the client, constitutes as fraud and the vendor is accountable and liable by pretty much most courts of law.
This would true even if the product in case would actually use all the 4 Gb of fb in a scalable way, as long the final consumer as a client would read that the piece of hardware is using 3.5Gb even if 4 would still account the vendor liable in a court of law……it would be as if someone sells a car that is claimed to drive 100mph top speed, but while actually driving it You would get a top speed of 85 Mph showing on Your display but You are actually traveling at 100 Mph.
To inform inaccurately the client even supported by ambiguous figures of its performance related to another product, true or false is irrelevant, and should be anyhow be done through the promoting OFFICIAL website where the product is offered with its specifications, with clear and visible explanations of the configuration that might be otherwise misleading to the client if not instructed properly on what he is being offered.
The final client pays for a product for its specifics and it is NOT up to the client to try to find out through 3rd parties if what he has bought is what he was promised to receive.
As all of this is misleading , while a genuine product should perform as promoted beyond any benefit of doubt, a vendor choosing the first over the latter is accountable for committing felony.
Based on that logic all i
Based on that logic all i think youd have to label all GPUs differently. Like how many channels they use for a “384bit bus” and stuff.
It can and does use the 4GB of memory. However, they should have just locked off .5GB and called it 3.5.
I really dont even get why people care so much. Its not like you can get a 970 to use 4GB of RAM and not be lagging anyway! Its only as powerful as a 780 and they cant play anything that fills 4GB with playable framerates either!
I most certainly do with two
I most certainly do with two 970s in SLI. That framebuffer fills up real quick.
And youre not lagging from
And youre not lagging from the fact that the GPU is barely more powerful than a 780?
Where is the Statement ?
Where is the Statement ?
Here is a good if somewhat
Here is a good if somewhat slow explanation of what is happening when the last 0.5GB is used. More surprising to me seems to be the driver realising that the 0.5GB is causing framerate/timing issues, which then seems to restrict the program to only use the 3.5GB pool. Possible engineered behaviour by Nvidia. I have been able to replicate this on my system.
https://www.youtube.com/watch?v=wgRir5JwKyU
Wow those GPU DIEs must have
Wow those GPU DIEs must have been really bad, so a solution was found to work around defective memory lanes/whatever, and what they(the suckers) do not know will not hurt them, unless they found out! did those marketing folks and green meanies not know about the high definition textures and super-sampling, and the extra memory/bandwidth needed
The fact is that you can’t
The fact is that you can’t play a game descently @1080p, @1440p and more when the used amount of Vram is superior to 3584MB.
I invite you to watch this video to understand better the issue and why it’s not possible to play a game with more than 3584Mb Vram. https://www.youtube.com/watch?v=wgRir5JwKyU
They advertise for a 4gb dedicated 224Gb/s 256-bit Video card and the Gtx 970 is defitiley not.
“…into a 3.5GB
“…into a 3.5GB section(32bit addresses?) and a 0.5GB section(64bit addresses?)”
Video memory addresses performance optimization?
4GB Video memory = 3.5GB section 32bit addresses + 0.5GB section 64bit addresses + 4.0GB system memory section 64bit addresses
8GB Video memory = 4.0GB section 32bit addresses + 4.0GB section 64bit addresses + 8.0GB system memory section 64bit addresses