What we know about Maxwell
Taking an interesting direction, NVIDIA is releasing the first Maxwell parts today in the form of the GeForce GTX 750 Ti.
I'm going to go out on a limb and guess that many of you reading this review would not have normally been as interested in the launch of the GeForce GTX 750 Ti if a specific word hadn't been mentioned in the title: Maxwell. It's true, the launch of GTX 750 Ti, a mainstream graphics card that will sit in the $149 price point, marks the first public release of the new NVIDIA GPU architecture code named Maxwell. It is a unique move for the company to start at this particular point with a new design, but as you'll see in the changes to the architecture as well as the limitations, it all makes a certain bit of sense.
For those of you that don't really care about the underlying magic that makes the GTX 750 Ti possible, you can skip this page and jump right to the details of the new card itself. There I will detail the product specifications, performance comparison and expectations, etc.
If you are interested in learning what makes Maxwell tick, keep reading below.
The NVIDIA Maxwell Architecture
When NVIDIA first approached us about the GTX 750 Ti they were very light on details about the GPU that was powering it. Even though the fact it was built on Maxwell was confirmed the company hadn't yet determined if it was going to do a full architecture deep dive with the press. In the end they went somewhere in between the full detail we are used to getting with a new GPU design and the original, passive stance. It looks like we'll have to wait for the enthusiast GPU class release to really get the full story but I think the details we have now paint the story quite clearly.
During the course of design the Kepler architecture, and then implementing it with the Tegra line in the form of the Tegra K1, NVIDIA's engineering team developed a better sense of how to improve the performance and efficiency of the basic compute design. Kepler was a huge leap forward compared to the likes of Fermi and Maxwell is promising to be equally as revolutionary. NVIDIA wanted to address both GPU power consumption as well as finding ways to extract more performance from the architecture at the same power levels.
The logic of the GPU design remains similar to Kepler. There is a Graphics Processing Cluster (GPC) that houses Simultaneous Multiprocessors (SM) built from a large number of CUDA cores (stream processors).
GM107 Block Diagram
Readers familiar with the look of Kepler GPUs will instantly see changes in the organization of the various blocks of Maxwell. There are more divisions, more groupings and fewer CUDA cores "per block" than before. As it turns out, this reorganization was part of the ability for NVIDIA to improve performance and power efficiency with the new GPU.
The biggest changes are seen in each of the new SMs, now called SMM (Maxwell indicator, previous Kepler based SM should be referenced as SMK) that can deliver 35% more processing power per CUDA core when shader bound. NVIDIA has changed scheduling on the SMM to be more intelligent, avoiding stalls more than previous implementations. This also means there is going to be more software-based work for the CPU to handle, but only by a handful of percent I am told.
These new SMMs were built to improve performance per watt as well as performance per area, a goal that all CPU and GPU designers have. NVIDIA was able to addresses them with changes to the control logic partitioning, workload balancing, clock gating, compiler-based scheduling, instructions per clock and quite a bit more.
Maxwell SMM Diagram
Rather than a single block of 192 shaders, the SMM is divided into four distinct blocks that each have a separate instruction buffer, scheduler and 32 dedicated, non-shared CUDA cores. NVIDIA states that this design simplifies the design and scheduling logic required for Maxwell saving on area and power. Pairs of these blocks are grouped together and share four texture filtering units and a texture cache. Shared memory is a different pool of data that is shared amongst all four processing blocks of the SMM.
With these changes, the SMM can offer 90% of the compute performance of the Kepler SMK but with a smaller die area that allows NVIDIA to integrate more of them per die. GM107, the first full shipping chip based on Maxwell, includes five SMMs (640 CUDA cores) while the GK107 GPU had two SMKs (384 CUDA cores) giving Maxwell a 2.3x shader performance advantage.
Other than the dramatic changes to the SM, the 2 MB L2 cache that NVIDIA has implemented on first version of Maxwell is the other very substantial change. Considering that the Kepler design had an L2 cache implementation at 256 KB, we are seeing an 8x increase in available capacity which should reduce the demand on the integrated memory controller of GM107 dramatically. Even with a 128-bit memory interface then, the GTX 750 Ti should not find DRAM performance to be a bottleneck.
NVIDIA has also improved the video capabilities of Maxwell by enhancing the performance of video encoding by a factor of 2x (users should see even less of a hit on performance when recording video with ShadowPlay now) and decoding by 10x.
A new power state called GC5 has been built to reduce the GPU's power usage during light workloads like video playback. API support is the same for Maxwell as it is for Kepler, meaning that that DirectX 11.2 is not fully supported.
GM107 – Maxwell's First Implementation
Though you can likely deduce many of the features of GM107 by looking at the data above, there are still some details about GM107 to share. With a single GPC, five SMM units and two 64-bit memory controllers, NVIDIA assures us this is the full implementation of GM107.
With 128 CUDA cores per SMM and 5 SMMs total, we get 640 total cores, a 66% increase over the 384 cores found in the GK107 Kepler GPU that was in the GeForce GTX 650. To be fair though, the GeForce GTX 650 Ti has 768 CUDA cores but at nearly 2x the TDP. The base clock of 1020 MHz and Boost clock of 1085 MHz are actually quite reserved; clocks with a modest overclock were easily touching the 1300 MHz level!
Peak theoretical compute performance hits 1.3 TFLOPS (a 60% increase over GK107) even though memory bandwidth remains essentially the same. This again is why the inclusion of the 2 MB L2 cache is so critical for efficient optimization of the Maxwell architecture.
GM107 is still built on 28nm process technology from TSMC but increases the die size by 25% over GK107 and uses 43% more transistors. Considering the 60% compute edge Maxwell has over Kepler in this segment the 25% area change indicates a big focus from NVIDIA on area and performance efficiency. Add to that NVIDIA's ability to get 2x the performance per watt for Maxwell over Kepler on the same 28nm process and it's easy to be impressed.
Future Maxwell GPUs?
If you are wondering where the high end products on Maxwell are, you aren't alone. All NVIDIA would tell us for now is that they would arrive "at a later date."
So, obvious first
So, obvious first question:
Any improvements in scrypt performance w/ Maxwell? 😉
There is some, but not a
There is some, but not a whole lot. We are looking at doing some testing today on the currency applications but the lack of optimization could be a hold off.
Thanks Ryan, can’t wait for
Thanks Ryan, can’t wait for your results!
Oh, the simple days when one could choose a GPU based on its game performance … I don’t miss them, not one bit.
http://cryptomining-blog.com/
http://cryptomining-blog.com/922-the-new-nvidia-geforce-gtx-750-ti-scrypt-mining-performance/
Not much in the way of graphs or pages of analysis but 265 KH/s and about 300 KH/s overclocked. Of course they were probably limited in the same way Ryan was when overclocking.
Thanks for the link, 265 kH/s
Thanks for the link, 265 kH/s at (or below) 75W don’t seem half bad!
Any chance this card supports
Any chance this card supports hdmi 2.0? is there anything coming out soon that will?
Just confirmed that the
Just confirmed that the GeForce GTX 750 Ti does NOT have HDMI 2.0. They won't talk about future products though…
Are you comparing it to a
Are you comparing it to a plain 650 Ti or a 650 Ti Boost in the article/benchmarks?
This is the NON Boost GPU.
This is the NON Boost GPU. The GeForce GTX 650 Ti Boost is EOL so I didn't think it should get in over the still available GTX 650 Ti.
Yes, but I was interested
Yes, but I was interested since I purchased the 650 Ti Boost and looking at the Non Boost versus the Boost 650 Ti it looks like the 750 Ti isn’t much over the 650 Ti Boost. (From an upgrade and price/performance perspective.)
Power consumption is nice to see, but not too much concern in a 650 Ti or 750 Ti size card, more of an interesting when the bigger cards come out.
If they did not EOL the 650 Ti Boost the comparison/benchmark charts comparing it to the 750 Ti would look a little weird I think.
Maxwell 2nd level high end
Maxwell 2nd level high end should be very interesting. I’m hoping for Titan Black performance at $500 and 200 watts.
I have a couple of
I have a couple of questions.
Is it absolutely certain that if this card had a dp output that it would support g-sync? or is that an assumption at this point?
When overclocking, does the mem or gpu clock increase affect the performance more?
Finally, do you have the power numbers on the overclock?
Thanks.
G-Sync support is confirmed
G-Sync support is confirmed yes, as long as a DP connection is present.
GPU clock definitely affects the perf more.
Ah, I didn't make a graph of power under the overclock! But power jumped from 184 watts to 202 watts (full system).
Hmm, I find it interesting
Hmm, I find it interesting that it scaled down so well when the Kepler architecture did not (with power consumption) I am really wondering how it scales into enthusiast territory considering these tweaks improved mainstream so much.
“One feature that the GTX 650
“One feature that the GTX 650 Ti card does NOT have is support for SLI which is quite disappointing.”
I believe you meant the “750”
Thanks, fixed!
Thanks, fixed!
Ryan, long time viewer here.
Ryan, long time viewer here. Great intro into maxwell. I just wish you guys still included bar graphs because the line graphs can be hard to compare one card to another if there is only say a ten or twenty percent difference .
I have had this feedback a
I have had this feedback a few times. We are going to integrate that again soon.
http://cdn.pcper.com/files/im
http://cdn.pcper.com/files/imagecache/article_max_width/review/2014-02-17/gpuz.png
Ryan, please update this with GPU-Z 0.7.7. It reports the GM107 specs correctly.
http://www.techpowerup.com/downloads/2340/techpowerup-gpu-z-v0-7-7/
Done!
Done!
http://www.geforce.com/whats-
http://www.geforce.com/whats-new/articles/nvidia-geforce-334-89-whql-drivers-released
Would this card be an
Would this card be an improvement over my 2 560Tis, or should I wait for the next Maxwell cards?
No, if you are running 560
No, if you are running 560 Ti's in SLI, I would wait.
Would this card be a good
Would this card be a good improvement over a single 560ti card?
Would I be better trying to get a second 560 for sli?
Would this card be an upgrade
Would this card be an upgrade from my GTX 260?
forgot to post my PC
forgot to post my PC specs:
Operating System:
Windows 2.6.1.7601 (Service Pack 1)
CPU Type:
Intel® Core™2 Quad CPU @ 2.66GHz
CPU Speed:
2.69 GHz
System Memory:
8.59 GB
Video Card Model:
NVIDIA GeForce GTX 260
Video Card Memory:
4.27 GB
Video Card Driver:
nvd3dum.dll
Desktop Resolution:
1680×1050
Hard Disk Size:
492.68 GB
Hard Disk Free Space:
235.47 GB (48%)
Download Speed:
1.49 MB/s (11.9 mbps)
forgot to post my PC
forgot to post my PC specs:
Operating System:
Windows 2.6.1.7601 (Service Pack 1)
CPU Type:
Intel® Core™2 Quad CPU @ 2.66GHz
CPU Speed:
2.69 GHz
System Memory:
8.59 GB
Video Card Model:
NVIDIA GeForce GTX 260
Video Card Memory:
4.27 GB
Video Card Driver:
nvd3dum.dll
Desktop Resolution:
1680×1050
Hard Disk Size:
492.68 GB
Hard Disk Free Space:
235.47 GB (48%)
Download Speed:
1.49 MB/s (11.9 mbps)
hi so im looking into a $500-
hi so im looking into a $500- $600 maybe a little more and was wondering if i should get this card or should i wait for the amd r7 265?
Ryan how do you make those
Ryan how do you make those “FPS by Percentile” charts in Excel? I’d like to do the same on my own, using fraps
Thanks
Any Cuda testing like Blender
Any Cuda testing like Blender Rendering? Would be nice to see performance improvements from Computing side.
I don’t agree with your
I don’t agree with your remarks Mr. Shrout. In the Skyrim, Metro Last Light and Bioshock Infinite graphs, 750 ti’s frame latency rises higher than 260x in the frame time variance graphs while they are at a clear tie in Crysis 3 and Battlefield 4. You also did not comment on the frame spikes in Bioshock.
So what would you say 570 or
So what would you say 570 or 750?
Looking to make a Dell
Looking to make a Dell OptiPlex 780 SFF into a Steambox. The system accepts upto 16gb ddr3 (4x4gb sticks), It is currently running a q6700 core2quad (95w). I have been looking at using this card, however the psu is 234 watts. The 12v rail though is 17A so assuming looking at 204 watts for the motherboard, ram, video card and cpu fan. The only other components hooked in are a bluray laptop drive and a 320gb 5400rpm 2.5 hdd from a chromebook underneath it. I did a live talk with dell and confirmed that they sell a hd 7750 1gb ddr3 video card for it you can order and add. So my question is, would a q6600 bsel modded to 3ghz, 2x4gb sticks of ddr3 and a gtx 750ti with that bluray drive and hdd work? If not I have a E8400 CPU I can use instead which would reduce the wattage down by 30watts since its rated 65 instead of 95.