TU102 and TU104 Specifications, Overclocking, and Multi-GPU
In an unusual move, NVIDIA is simultaneously launching the Turing architecture into the market with two different GPUs.
Representing the highest-end offerings we are likely to see with Turing, the TU104 and TU102 GPUs are also some of the biggest GPUs regarding silicon area occupied and transistor counts that we've ever seen.
RTX 2080 Ti | Quadro RTX 6000 | GTX 1080 Ti | RTX 2080 | Quadro RTX 5000 | GTX 1080 | TITAN V | RX Vega 64 (Air) | |
---|---|---|---|---|---|---|---|---|
GPU | TU102 | TU102 | GP102 | TU104 | TU104 | GP104 | GV100 | Vega 64 |
GPU Cores | 4352 | 4608 | 3584 | 2944 | 3072 | 2560 | 5120 | 4096 |
Base Clock | 1350 MHz | 1455 MHz | 1408 MHz | 1515 | 1620 MHz | 1607 MHz | 1200 MHz | 1247 MHz |
Boost Clock | 1545 MHz/ 1635 MHz (FE) |
1770 MHz | 1582 MHz | 1710 MHz/ 1800 MHz (FE) |
1820 MHz | 1733 MHz | 1455 MHz | 1546 MHz |
Texture Units | 272 | 288 | 224 | 184 | 192 | 160 | 320 | 256 |
ROP Units | 88 | 96 | 88 | 64 | 64 | 64 | 96 | 64 |
Tensor Cores | 544 | 576 | — | 368 | 384 | — | 640 | — |
Ray Tracing Speed | 10 GRays/s | 10 GRays/s | — | 8 GRays/s | 8 GRays/s | — | — | — |
Memory | 11GB | 24GB | 11GB | 8GB | 16GB | 8GB | 12GB | 8GB |
Memory Clock | 14000 MHz | 14000 MHz | 11000 MHz | 14000 MHz | 14000 MHz | 10000 MHz | 1700 MHz | 1890 MHz |
Memory Interface | 352-bit G6 | 384-bit G6 | 352-bit G5X | 256-bit G6 | 256-bit G6 | 256-bit G5X | 3072-bit HBM2 | 2048-bit HBM2 |
Memory Bandwidth | 616GB/s | 672GB/s | 484 GB/s | 448 GB/s | 448 GB/s | 320 GB/s | 653 GB/s | 484 GB/s |
TDP | 250 W/ 260 W (FE) |
260 W | 250 watts | 215W 225W (FE) |
230 W | 180 watts | 250W | 292 |
Peak Compute (FP32) | 13.4 TFLOPS / 14.2 TFLOP (FE) | 16.3 TFLOPS | 10.6 TFLOPS | 10 TFLOPS / 10.6 TFLOPS (FE) | 11.2 TFLOPS | 8.2 TFLOPS | 14.9 TFLOPS | 13.7 TFLOPS |
Transistor Count | 18.6 B | 18.6B | 12.0 B | 13.6 B | 13.6 B | 7.2 B | 21.0 B | 12.5 B |
Process Tech | 12nm | 12nm | 16nm | 12nm | 12nm | 16nm | 12nm | 14nm |
MSRP (current) | $1200 (FE)/ $1000 |
$6,300 | $699 | $800/ $700 |
$2,300 | $549 | $2,999 | $499 |
At 4352, the RTX 2080 Ti features just over 20% more CUDA cores than the previous generation GTX 1080 Ti. Similarly, the RTX 2080 features 15% more CUDA cores than the GTX 1080.
Clock speeds however, seem to be mostly comparable generation-to-generation, with the base clocks of the RTX cards actually coming in a bit lower than the Pascal-based GTX 10-series GPUs.
As usual, the RTX 2080 Ti does not represent the fully enabled TU102 die, leaving room for a potential TITAN RTX in the future.
One of the most bizarre differences in Turing specifications is the needed separation between "Founder's Edition" and "Reference" specifications. Since NVIDIA is for the first time selling their Founder's Edition graphics cards as Overclocked out of the box, it will be interesting to see what, if any third party designs run at the reference clock speeds.
NVIDIA Scanner
Speaking of overclocking, one of the most personally exciting features of Turing is NVIDIA Scanner.
Essentially, NVIDIA Scanner will be built into programs like EVGA Precision X and MSI Afterburner that will allow for automated overclocking of RTX-based graphics cards.
With one click, users can start the scanner, which will then begin applying higher frequencies, testing stability, and adjusting voltages as necessary until it reaches what NVIDIA has determined is the highest stable frequency at the lowest possible voltage.
The test load that NVIDIA is running is a math-based test, instead of a graphical one. This NVIDIA tuned workload is meant to ensure the highest level of stability across the broadest range of applications.
If you've ever had an overclock that you thought was stable only to crash on a new game title, you'll understand the problem that NVIDIA is trying to solve here.
Hardcore enthusiasts need not worry, the same level of manual overclocking support that was seen in Pascal will also be available for Turing, for users who aren't happy with the overclock from NVIDIA Scanner or want to tune the card completely themselves.
While NVIDIA is launching this feature with the RTX GPUs, they are expected to (eventually) bring NVIDIA Scanner support to older GPUs, such as the Pascal-based GTX 10-series.
NVLink
The NVLink interface now handles Multi-GPU (SLI) support on Turing. While NVIDIA has been using NVLink in enterprise-level products, such as the Volta V100 GPU for a bit, Turing marks the first adoption of NVLink on a consumer graphics card.
NVLink is capable of a massive 50GB/s of bandwidth with the single link connection found on the RTX 2080, and 100GB/s with the dual link connection on the RTX 2080 Ti. This additional bandwidth won't help your current SLI experience at all but is necessary to achieve proper SLI performance at resolutions such as 8K.
With a new interface, of course, comes new bridges. NVLink bridges are set to be available alongside the retail launch of the RTX 2080 and 2080 Ti on September 20th from a wide array of partners and NVIDIA themselves.
Please focus your 2080 review
Please focus your 2080 review on the comparison between it and the 1080ti not the 1080, as the 2070 is more than likely the true replacement for the 1080. Additionally please include 4k none hdr results in your comparison. I have my 1080ti running at 2,139/12,000 under water and am very curious to see the 2080 can match my card in conventional games 4k none hdr, looking forward to your results.
There’s only one metric by
There’s only one metric by which the 20-series should be compared to the 10-series: PRICE.
Also, this deep-learning AA stuff should be DISABLED for raw benchmarking. If NVIDIA wants to sell us on RTX, they can’t pretend that we only play 7 games they have released profiles for.
Cherry-picked BS won’t be tolerated. Don’t pull a Tom’s Hardware “just buy it” move.
Look a dumb@$$ AMD fanboy
Look a dumb@$$ AMD fanboy trying to dictate the terms of a review.
If DLSS is AA and not upscaling it should be turned on and compared to AMD running comparable Quality AA. Sorry if the numbers will look bad for VEga.
so, where exactly did the
so, where exactly did the other guy mention or even imply AMD fanboyism?
Calling out a company for BS tactics is exactly just that: calling out a company for BS tactics.
Considering I only have
Considering I only have Intel/NVIDIA setups in my house, I’m hardly an AMD fanboy. BS is BS, no matter who’s selling it. Raw performance for the dollar is the only real factor that matters. NVIDIA is pushing ray tracing (very little support) and upscaling tricks (also minimal support) in order to distract from the fact that they are offering – at best – 30% more performance for 80% more money. Like I said, BS.
Agree. The GTX 2080 is ~20 %
Agree. The GTX 2080 is ~20 % more expensive for providing only ~5 % of performance bump compared to the GTX 1080 Ti.
I suspect nVidia’s goal is to make the GTX 1080 Ti look cheap while it’s not.
The World Health Organization is right, video game is an addiction like tobacco, casinos, etc
If only price should be
If only price should be compared, go get a free GPU from a trash recycler. Because it’s free, the price is zero, and the price/perf is for all intents and purposes, infinite. Price, performance, and price/performance all matter.
Test all of the new stuff but it’s probably not a good idea to buy based on future technologies. AMD users who counted on DX12 giving them a big jump later a la FineWine learned this the hard way, and now Nvidia users should learn from that and not make the same mistake.
Conversely it’s idiotic to dismiss DLSS/Ray Tracing when the reviews aren’t even out yet.
I would hope that any
I would hope that any competent reviewer would run benchmarks for both support and unsupported games. And then show the benchmarks for supported games with it on and off.
Nice overview of the
Nice overview of the internals, but how about some actual simple benchmark comparisons with previous gen cards?
Don’t you think Ken would
Don’t you think Ken would have included a performance evaluation if he could? They’re under embargo and were only allowed to talk about the architecture. If you check out the other major websites it’s the same situation everywhere.
What everyone is waiting for,
What everyone is waiting for, to see where they end up for example here:
https://www.videocardbenchmark.net/high_end_gpus.html
Are the AIB vendors going to
Are the AIB vendors going to lock down the Nvidia OC scanner feature to only their own cards? At launch Precision X’s scanner worked with my 1080FE only to have later versions say that my card was not supported as it was not “EVGA”.
Did the MSI tool have this restriction?
The previous Precision X
The previous Precision X Scanner was a feature that EVGA implemented and only worked with EVGA cards as you noticed.
NVIDIA Scanner will work with all GPUs, no matter the vendor and can be implemented into any of the NVAPI applications like Afterburner, and software from ASUS, Gigabyte, etc.
“The NVLink interface now
“The NVLink interface now handles Multi-GPU (SLI)”
I would not call NVLink “SLI” any more than I would call AMD’s Infinity Fabric/xGMI “CF” as there is more to NVLink than just some SLI type driver only managed Milti-GPU. NVLink has more hardware based cache coherency protocol communication capabilities for Nvidia’s GPUs(Power9’s to Nvidia GPUs also) and that’s also true for AMD’s Infinity Fabric/xGMI interface, and that xGMI is supported on both Zen and Vega. There is on both NVLink and Infinity Fabric a more direct processor(CPU to GPU and GPU to GPU) cache to cache coherency siginaling capabilities than any SLI/CF driver managed multi-GPU could ever hope to achieve.
I think that both NVLink and Infinity Fabric will allow multiple physical GPUs to appear more like a single larger logical GPU to drivers and software. And this IP has the potential for future modular die based offerings that Both AMD and Nvidia are researching for muiti-die module based GPUs on future products.
Also in both the DX12/Vulkan API’s driver model the GPU’s drivers are simplified and to the metal with any Multi-GPU load balancing given over to the games/gaming engine developers and SLI/CF are depreciated IP that are not going to be used for DX12/Vulkan gaming. Both DX12/Vulkan has that Explicit Mulit-GPU Adaptor IP in their respective APIs that’s managed via these graphics APIs and the game/gaming engine software that makes use of the DX12/Vulkan.
You are also missing the Integer performance(INT32) on that chart where you only list the FP32 performance. It looks like Nvidia may have begun to release more whitepapers and there is probably some patent filings to go over to get some Idea of just what Nvidia has implemented in its RT core hardware. That Tensor Core AI based Denoising needs a deep dive also and I think that Nvidia does the AI algorithm traning on its massive Volta Clusters and then loads that trained AI onto the AI cores on Turing so that process needs a deep dive as well.
I’d expect a continous refinement for the Denoising AI over time in addition to the DLSS AI based Anti Aliasing/other algorithm training. Tensor Cores AI based sound processing and even compression, physics/other AIs are also possible once there are hardware based tensor cores to help speed up the process over any more software based AI solutions.
Also for refrence is Jeffrey A. Mahovsky’s Thesis Paper(1) on Reduced Precision Bounding View Hierarchy (BVH). He is often quoted in many other’s newer papers on the subject.
(1)
”
THE UNIVERSITY OF CALGARY
Ray Tracing with Reduced-Precision Bounding Volume Hierarchies
by
Jeffrey A. Mahovsky”
https://pages.cpsc.ucalgary.ca/~brosz/theses/PhD%20Thesis%20-%202005%20-%20Jeffrey%20Mahovsky%20-%20Ray%20Tracing%20with%20Reduced-Precision%20Bounding%20Volume%20Hierarchies.pdf
How about a direct link to
How about a direct link to the Nvidia whitepaper if possible as there is a lot of material to be covered.
Techreport’s article has explaned BVH in a ELI5 manner in their writeup but they do have a copy of the Nvidia whitepaper so if Nvidia has published it on their website a link to the whitepaper would be helpful.
and here it is(1):
(1)
“NVIDIA TURING GPU
ARCHITECTURE
Graphics Reinvented”<--* *-->[see that phrase that’s marketing’s dirty hands right there, but still the whitepaper is very informative as Nvidia’s whitepapers usually are]
https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf
Anyone has any idea (or
Anyone has any idea (or educated guess) if it will be possible to use RayTracing and DLSS at the same time? I mean, they both use tensor cores so would they be competeing for resources? Will there be enough tensor cores to do both?
Raytracing uses RTX cores.
Raytracing uses RTX cores. It only uses the Tensor cores for Denoising which happens at a different “stage”. Nvidia put out a chart showing when each core is active. But the answer is yes.
There is a link to the Nvidia
There is a link to the Nvidia Turing whitepaper right above and why do you not go and read that and then ask questions. That’s where all the Online Tech “Journalists” got their material for their articles on Turing.
If you want some better explanations go over to TechPowerUp’s and the TechReport’s articles on Turing as they are doing a better ELI5 treatments. Don’t bother with Anandtech’s article as you will spend more time swatting at the annoying auto-play ads than you will spend trying to read!
And teach your children to go over to the local College Library and read the Proper Academic and Professional Trade Journals that are usually paywalled online. Most colleges with computer science departments have the proper subscriptions paid to the online Academic and Professional Trade journals that can be accessed via the college library’s web address if you use the library’s available PCs/terminals or maybe even wifi. LexusNexus is a million times better than Google.
P.S. some College Libraries are Student Only but if the College Library is an official Goverment Document Depository Library/Federal Depository Library Program (FDLP) member then that library has to be open to the public by law. The local State University/Junior College libraries are the most open but in large Metropolitan Areas(The Northeast mostly) the homeless have ruined the public access to many private Universities’ Libraries!
But Ray Tracing is done on Turing’s RT cores and the AI is done via the Trained AI running on the Turing Tensor Cores so that implies some possible concurent Ray Tracing(On the RT cores) and Denoising on the Tensor cores as they are different sets of functional blocks on the GPU. The Tensor cores can be used for all sorts of AI based processing including that DLAA and even audio processing can be done on Tensor Cores. Tensor Cores are just Hardware Based matrix math units anyways!
One of the best College electives that I have ever had was a 1-credit hour per semester Library Science Class. That curriculum was offered over a few semesters with different Library Science research and categorization methods learned with each diffferent 1-credit Hour/Semester LS101/LS102 curriculum and that made my research process so much more productive.
Right. No college for my
Right. No college for my kids, don’t live in US or any other country that has college for that mater.
About Ray Tracing – I was thinking that since it uses tensor cores for AI acceleration and DLSS is using it to decode and apply hints for upscaling content (it was stated that the DLSS is running purely on tensor cores and that cuda cores send completed picture to tensor cores to be “upgraded” and then back for further processing down the pipeline). We are all reading about how compute demanding RT is and tensor cores are used for (I think, correct me if I’m wrong here) ray collision detection (or is that one done in RT cores?) and again for AI accelerated denoising – I supposed that tensor cores would be Very Busy with work and may not like the contention for resources. Unless DLSS is actually very light on workload or somehow they are working at different times? (wonder how would that work with consecutive frames going through pipeline).
Several questions for your
Several questions for your review:
RTX – is raytracing performance independent of resolution? I can’t understand why the number of rays traced would change due to resolution.. seems like it should be based on number of lights and number of objects.
Is RTX just on/off or will we have 3 or 4 major features you can toggle? Will there be low, medium, high settings?
DLSS – many people are claiming the speedup comes from rendering at a lower resolution and upscaling. If this is the case PLEASE make sure to compare TRUE 1080p to TRUE 1080p and TRUE 4k to TRUE 4k. Don’t fall for marketing BS. HOWEVER, IF this really is just a superior form of AA with no overhead cost please make sure to say that LOUDLY to shut up all the haters.
Raytracing is done for every
Raytracing is done for every pixel on the screen so it is very dependent on resolution. From every light source a ray is traced to every pixel unless it is directional light then only the subset of pixels affected by the light source is directly traced for.
Then there are all reflections from the pixels that are lighted up by the light source – the complexity rises dramatically.
RTX will have options, quality of shadows, light, reflections and so on based on how many rays are traced and then extrapolated, but there is a lowest limit to how many rays are actually traced because if not enough is traced, the AI thet extrapolates them will fail.
DLSS – as far as Nvidia explained it, it is rendered at target resolution and then AI is used to apply the precalculated cues for textures and objects of how should they look in the “perfect world”. So 4k with TAA should be compared with 4k with DLSS – just as Nvidia is doing on their’s slides.
People clearly don’t
People clearly don’t understand that there is still an NDA in place and all the questions you are asking cannot be answered without severe ramifications until the day everyone and their mother all drops full benchmark articles/videos on the same day like every other release. I’m sure a good portion of those are already made or written and just waiting to go on whatever day the NDA lifts.
This is true and some other
This is true and some other websites have stated that fact and really the FTC should require that all NDA deadlines be published in advance by any device/procesor/whatever makers. This is so consumers can know when the data will be available for the consumer to make an educated purchasing decision!
That Tom’s Hardware USA “Just Buy It” nonsence should have been earning Tom’s hardware USA a big fat fine! But that is the nature of the unregulated online “Press” that’s not held to the same standards as the print and over that air waves TV industries are! Online can be found the same hucksters, grifters, and snake oil salesmen that have been properly regulated out of the Print/TV media by the FCC/FTC for decades.
Most of the “questions” I am
Most of the “questions” I am seeing asked, seem to be more for what they want in the review of the card. Not immediate responses now.