First, Some Background
The rumored GP102 is the first of its kind since Fermi. How big of a change could it be?
NVIDIA's Rumored GP102
- – GP100's ideal 1 : 2 : 4 FP64 : FP32 : FP16 ratio is inefficient for gaming
- – GP102 either extends GP104's gaming lead or bridges GP104 and GP100
- – If GP102 is a bigger GP104, the future is unclear for smaller GPGPU devs
- This is, unless GP100 can be significantly up-clocked for gaming.
- – If GP102 matches (or outperforms) GP100 in gaming, and has better than 1 : 32 double-precision performance, then GP100 would be the first time that NVIDIA designed an enterprise-only, high-end GPU.
When GP100 was announced, Josh and I were discussing, internally, how it would make sense in the gaming industry. Recently, an article on WCCFTech cited anonymous sources, which should always be taken with a dash of salt, that claimed NVIDIA was planning a second architecture, GP102, between GP104 and GP100. As I was writing this editorial about it, relating it to our own speculation about the physics of Pascal, VideoCardz claims to have been contacted by the developers of AIDA64, seemingly on-the-record, also citing a GP102 design.
I will retell chunks of the rumor, but also add my opinion to it.
In the last few generations, each architecture had a flagship chip that was released in both gaming and professional SKUs. Neither audience had access to a chip that was larger than the other's largest of that generation. Clock rates and disabled portions varied by specific product, with gaming usually getting the more aggressive performance for slightly better benchmarks. Fermi had GF100/GF110, Kepler had GK110/GK210, and Maxwell had GM200. Each of these were available in Tesla, Quadro, and GeForce cards, especially Titans.
Maxwell was interesting, though. NVIDIA was unable to leave 28nm, which Kepler launched on, so they created a second architecture at that node. To increase performance without having access to more feature density, you need to make your designs bigger, more optimized, or more simple. GM200 was giant and optimized, but, to get the performance levels it achieved, also needed to be more simple. Something needed to go, and double-precision (FP64) performance was the big omission. NVIDIA was upfront about it at the Titan X launch, and told their GPU compute customers to keep purchasing Kepler if they valued FP64.
GPU manufacturers jump from 28nm, past 20nm, down to 14nm (AMD) and 16nm (NVIDIA). This double-jump in fabrication technology gives them a lot of room to add features, such as more shader cores and other accelerators (video decode, simultaneous multi-projection, etc.). Alternatively, they can produce a smaller chip with the same amount of performance, yielding more from a batch.
One thing that we knew NVIDIA was planning to add to Pascal is 16-bit support. This will allow developers to trade a boost in speed (by pushing two, 16-bit calculations through a space that's designed for 32-bit values) for a reduction in precision, but no specific details were given. 64-bit would also be supported but, historically, it was some fraction of 32-bit performance, especially on gaming SKUs.
Then they announced that GP100 would have a performance ratio of 4 FP16 : 2 FP32 : 1 FP64.
Okay then… that's a lot of die area that is not being used for single-precision. If you're a researcher or another high-performance computing customer, then this is music to your ears. Otherwise, that's a lot of performance for calculations that your software will basically never make… that is, unless the games industry was about to change in a dramatic way.
It isn't. NVIDIA announced the GeForce GTX 1080 and its GP104 processor.
In terms of performance, this architecture has the same ratio as Maxwell, 32 FP32 : 1 FP64, and, while FP16 is supported, it's just there for compatibility reasons. You don't want to use half-precision. FP32 is the only first-class citizen in GP104. That said, GP100 is twice the size (and transistor count) of GP104, but it only has 50% more shaders. Actual performance may even be less, too, if the bigger chip requires a lower clock rate due to the higher chance of manufacturing defects occurring within each die's boundaries.
What This Means
GP100 is the first chip from NVIDIA to reach the ideal 1 : 2 : 4 ratio between 64-bit, 32-bit, and 16-bit calculations. (Workstation Fermi did 1: 2 in FP64 : FP32 — but not FP16.) This push might have been encouraged by Intel's Xeon Phi co-processor, which has secured some super-computer design wins due to its double-precision performance, which is, likewise, 2x single-precision. (As far as I know, AVX-512 doesn't support FP16 instructions.) I can see why NVIDIA would want FP64 to return as full-speed data type to keep customers away from Intel, with its high-tech fabrication processes and x86-everywhere mindset. GP100 is theoretically faster than Knight's Landing, but that doesn't mean anything if your customers already wrote their software, and did so exclusively for an x86, many-threaded architecture. It also pulls their designs away from the needs of gaming.
When GP100 was announced, I saw one of three outcomes:
- Gaming shifts into new features, like 64-bit world coordinates, to use capacity
- GP100's die area waste is still acceptably low, like GF110, for NVIDIA to justify ignoring it
- NVIDIA diverges their gaming and professional designs
The first outcome, where gaming suddenly embraces 64-bit computation on the GPU, was incinerated when GP104 was announced. I figured that, due to the double-jump in fabrication nodes, it would be the best time to take the hit (and still show a performance increase) if they knew something was on the horizon. Apparently they don't. This leaves the fight between the last two points. Designing an extra chip would take effort, not to mention alienate enthusiasts who want NVIDIA's “best” chip, but GP100 is quite expensive in ways that might not make sense for home users.
We're now hearing about GP102, which is rumored to be the big gaming chip of this generation. It is said to slide itself between GP100 and GP104 in terms of die area, but we don't know whether it will use HBM2 or GDDR5X memory. Whenever we get a new Titan, and perhaps a GTX 1080 Ti, or whatever they're called, this seems to be the silicon that powers it.
The thing is, GP102 also drives a wedge between gamers and GP100, depending on how it's tuned. As we said earlier, there might not be a large gap between GP104 and GP100 for gaming, which makes me wonder whether the whole other product stack will actually perform under GP100, or around GP100. Either GP100 still has a lot of headroom, and will take the crown on a second generation of Pascal, or GP102 performs equivalent (or better) under gaming scenarios, clearly dividing the market between the two chips.
This is where we stop and ponder what NVIDIA's future product stack will look like. The original Titan (and Titan Black) introduced enthusiasts to a high-performance GPU compute card that also happened to be the best gaming card available. GK110, again, didn't have an ideal, 1 : 2 FP64 ratio, but 1 : 3 is pretty close. GP102 could be closer to GK110, at a 1:4 or 1:8 ratio, but that would even more effort on NVIDIA's part for a probably low-volume chip. If this is the case, GP102 could be the compromising bridge between the two product categories — better gaming performance with a taste of high-performance compute.
On the other hand, it could be a scaled-up GP104, and have a 1 : 32 ratio. This would be easier to develop, but also cast away anyone looking for a cheap, but GPGPU-friendly middle-ground. Whatever we get should reveal NVIDIA's product strategy going forward.
“Then they announced that
“Then they announced that GP100 would have a performance ratio of 4 FP16 : 2 FP32 : 1 FP16 ” I think that should be: 1 FP64
“One thing that we knew
“One thing that we knew NVIDIA was planning to add to Pascal is 16-bit support. This will allow developers to trade a boost in speed (by pushing two, 16-bit calculations through a space that’s designed for 32-bit values) for a reduction in precision, but no specific details were given”
“FP16 Arithmetic Support for Faster Deep Learning” (1)
16 bit is there in P100 for deep learning, and there are some other links to more information at the link listed below. GP100 is not ment/tuned for gaming, so GP104 for gaming, and GP102 may be some sort of middle gound between GP104 and GP100. Maybe somthing to go up against AMD’s Vega if it comes sooner, before Nvidia gets Volta to market.
I was referring to before
I was referring to before GTC16. They announced mixed-precision last year, but didn't elaborate. It's very useful for deep learning, but there might be some gaming situations where FP16 is sufficient. Apparently, NVIDIA doesn't think so, though.
For instance, some audio propagation simulations might be sufficient with FP16 world coordinates? Maybe even processing the audio samples itself (despite being a problem for 24-bit+ systems)?
Mixed-precision 16 bit
Mixed-precision 16 bit announced last year still warrants some further investigation for any articles this year, and it was some easy Google-Fu to find the Nvidia posts that explain the reasoning to a greater degree of just what that 16 bit FP is for on the GP100 SKU.
That and reading the Register’s GP100 articles and their sister site The Next Platform, as well as some scientific/HPC computing news sites. The Pascal, and AMD, SKUs developed for the scientific/HPC/Workstation markets have different uses than the derived gaming SKUs have, so GP104 and GP102 will be tuned for gaming, and maybe some other uses different than GP100 with more Deep learning done on GP100. I’m not looking at any gaming uses for a GP100 SKU that is clearly meant for a different market, there may be some uses for 16 bit FP across Nvidia’s different market segments but FP32 appears to be what is needed for PC/laptop gaming, but maybe that 16 bit can come in handy for some other non graphics gaming compute workloads like audio or physics/other.
More looking and Comparing of the Asynchronous compute improvements in Nvidia’s Pascal micro-architecture needs to be done, and that includes the finer grained thread scheduling without the requirement of any CUDA code dependencies! Will that fine grain scheduling via OpenCL/Vulkan and other open APIs be available on Nvidia’s gaming SKUs, and will gaming be becoming even more dependent on specialized middle-ware from Nvidia to take advantage of the lesser degree of Nvidia Asynchronous compute in its GPU hardware compared to AMD’s GCN Ace units on its Polaris and earlier GCN SKUs!
Will Volta get more and better in hardware Asynchronous compute improvements that will be transparent to any software/APIs so that all code will benefit from whatever Nvidia’s future Asynchronous compute improvements in its hardware can offer compared to AMD’s competing SKUs. Once the Polaris NDAs expire and the full extent of the Polaris “GCN 4/GCN 1.3” Asynchronous compute improvements are fully known and the white papers arrive for the Polaris micro-architecture, will there be more direct Asynchronous compute fully in the GPU’s hardware feature for feature comparisons between Pascal and Polaris, or will there be only Just Benchmarks attempted using benchmarking software that has yet to catch up fully to the new technology in software/APIs/GPU hardware. I suspect that Nvidia is still not up to AMD’s level of Asynchronous compute as fully implemented in hardware yet, even with Pascal, and that CUDA requirement for finer grained thread scheduling is very suspicious.
probably just NVDAs way of
probably just NVDAs way of saying to AMD we see your paper launch and here is ours…. suck it
Save the I’s
Actually, NVDA has been
Actually, NVDA has been better this year, (It’s their stock ticker symbol if you weren’t aware), and has more than bought me a 1080 in the last month! 🙂
With one Benjamin going into
With one Benjamin going into one founder’s pocket, instead of the shareholders’ equity! Talk about Overcompensation, the early birds get the shaft, and a big case of the CUDAs!
Let the Async-compute wars begin, and AMD’s Async-compute goes back a few GCN generations, while Nvidia crafts some driver updates for its earlier SKUs to force many to spend even more Benjamins on those newer SKUs, and even the newer Pascal SKUs do not have enough fully in the GPU’s hardware async-compute ability, and users must use CUDA to get at any fine grained GPU thread scheduling/context switching functionality on the Pascal based GPUs, so more of that vendor lock-in is necessary!
Just you wait until those APUs on an Interposer hit the HPC/workstation market and the Zen/Cores connected to a Fat GPU die get wired up via an interposer with a much wider and higher effective bandwidth connection fabric, that even Nvlink or PCI 4.0 will not be able to match! AMD should also jump at getting its GPU accelerators integrated with some OpenPower SKUs, using AMD’s off Die/Module interconnect fabric IP.
There will be High End Gaming APUs on an interposer derived from those very HPC/workstation SKUs so Nvidia will have to come up with something else besides NVLink to compete in that marketplace. Just Imagine some future Discrete AMD Card with an APU on an Interposer there instead of simply a GPU connected to HBM! A discrete GPU with some extra CPU cores to go along with the HBM, with the ability to run the whole game/gaming engine, as well as the gaming OS, and with each discrete APU card the user gets extra CPU cores to add to the systems total processing ability for gaming, or other uses.
Nvidia better start looking at some power based Gaming systems, Nvidia not having its own x86 license could get maybe a Power8 4 core processor derivative going for some Nvidia based SOC on an interposer competition, before it gets further behind.
I just build a new system and
I just build a new system and went mid-tier GTX960 as a hold over for the next great watercooled GPU. 1080 was announced, and I waited eagerly for the youtube and PCPER discussions. They were good, but somewhat reserved. Now on to the next Titan. If it will release in a watercooled version by end of 2016, I can wait.
Reserved? How so? Ryan
Reserved? How so? Ryan discussed the shit out of the 1080, and did almost every single benchmark known to man. Along with graphs, charts, comments and analysis. Then, Tom Petersen came on and talked about the 1080 and pascal, again.
And you know as well as I do that you won’t end up buying shit. You’re one of those internet people that just talks shit. Admit it, even the 960 was a stretch, and you’re probably going to starve for 3 weeks.
Can we remove this toxin from
Can we remove this toxin from the comments?
I am in full agreement. But
I am in full agreement. But they probably won’t.
Yeah… many of us tend to
Yeah… many of us tend to lean on the side of free speech (and yes, we know we don't need to).
It's a tough balance, because one of the best ways to deal with trolls, in terms of de-escalating the situation, is often to engage civilly with them. A lot of the time, people are aggressive because they want to be heard. In those cases, it can be used to gauge how easily your audience perceives that they can contact you. At the same token, it obviously upsets other readers, so it shouldn't be accepted. Just deleting the comments and silencing them will often exacerbate the issue, though.
Again, it's tough. People are expending effort, however much or however little, to express something. They're delivering it in a package that smells like feces and makes me want to wear medical gloves when handling it, but they spent effort to express it. Why? can be a very useful question.
Here's a (horrifically sad) analogy. From what I hear, when missionaries go overseas to horrific warzones, etc., to care for orphans, one of the first things that gets them is the infants crying — they don't. They're completely silent. They know, even at their months-old age, that crying is just a waste of energy, and they don't even bother.
Again, infants. Can't even talk yet.
But, at the same time, it ruins the experience of our other readers, and sometimes even makes them uncomfortable to be here. That's also an issue. It's tough to make a genuinely good policy on it. :
((Also, our comment software over-deletes. In cases where a reply is useful, we can't save it when we nuke the parent post. That doesn't matter when all replies are as bad, or just refers to how bad the to-be-deleted comment is, but it can be an issue. We started to edit-out vulgar in-place, though.))
(Scott: Nuked for excessive
(Scott: Nuked for excessive cursing.)
I have expected separate
I have expected separate architectures for a while. The 64-bit compute is a good way to segment the market and it just makes sense since those 64-bit resources take a lot of die area and probably consume a bit of extra power. That is a big waste of die area for a consumer GPU that doesn’t need the 64-bit performance.
Being a mid-level graphics
Being a mid-level graphics card person, I eagerly await info on the GTX 1060, and AMD’s equivalent GPU’s.
Have been very happy with my GTX 960 4GB (Gigabyte 3 fan one). It handles everything I throw at it at 1080p.
Well up until I threw the kitchen sink at it.
Lol, video cards have a
Lol, video cards have a tendency to perform poorly under kitchen sink type loads, you need armoured FP64 for that
Bring out the real cards
Bring out the real cards already! Who wants stuff that can only run in 1080p 🙂
Yeah bro, I was in Best Buy
Yeah bro, I was in Best Buy the other day looking for a new video card for my rig and all they had was shit for 1080 gaming. I just bought a 1440 monitor over Christmas! So I waddled back to my Honda Civic and farted out of the parking lot.
Nvidia’s decision to release
Nvidia’s decision to release GP102 vs GP100 will entirely depend upon Vega’s performance.
Good write-up with minor
Good write-up with minor correction, GK210 was a tweak of GK110 for enterprise only. GP102 most likely means HBM2 won’t make it to the desktop market for Nvidia in favor for the much higher available and drop-in convenience of GDDR5X but we shall see!!!!
It actually make sense,
It actually make sense,
Gp102 is the successor or gm200, while gp100 is the successor of gk110.
They managed to survive maxwell with dp compute, its shouldnt be a profit issue for them.