Today at the AMD Capsaicin & Cream event at GDC 2017, Senior VP of the Radeon Technologies Group, Raja Koduri officially revealed the branding that AMD will use for their next generation GPU products.
While we usually see final product branding deviate from their architectural code names (e.g. Polaris becoming the Radeon RX 460, 470 and 480), AMD this time has decided to embrace the code name for the retail naming scheme for upcoming graphics cards featuring the new GPU – Radeon RX Vega.
However, we didn't just get a name for Vega-based GPUs. Raja also went into some further detail and showed some examples of technologies found in Vega.
First off is the High-Bandwidth Cache Controller found in Vega products. We covered this technology during our Vega architecture preview last month at CES, but today we finally saw a demo of this technology in action.
Essentially, the High-Bandwidth Cache Controller (HBCC) allows Vega GPUs to address all available memory in the system (including things like NVMe SSDs, system DRAM and network storage.) AMD claims that by using the already fast memory you have available on your PC to augment onboard GPU memory (such as HBM2) they will be able to offer less expensive graphics cards that ultimately offer access to much more memory than current graphics cards.
The demo that they showed on stage featured Deus Ex: Mankind Divided running on a system with a Vega GPU running with 2GB of VRAM, and Ryzen CPU. By turning HBCC on, they were able to show a 50% increase in average FPS, and a 100% increase in minimum FPS.
While we probably won't actually see a Vega product with such a small VRAM implementation, it was impressive to see how HBCC was able to dramatically improve the playability of a 2GB GPU on a game that has no special optimizations to take advantage of the High-Bandwidth Cache.
The other impressive demo running on Vega at the Capsaicin & Cream event centered around what AMD is calling Rapid Pack Math.
Rapid Pack Math is an implementation of something we have been hearing and theorizing a lot about lately, the use of FP16 shaders for some graphic effects in games. By using half-precision FP16 shaders instead of the current standard FP32 shaders, developers are able to get more performance out of the same GPU cores. In specific, Rapid Pack Math allows developers to run half-precision FP16 shaders at exactly 2X the speed of traditional standard-precision FP32 shaders.
While the lower precision of FP16 shaders won't be appropriate for all GPU effects, AMD was showing a comparison of their TressFX hair rendering technology running on both standard and half-precision shaders. As you might expect, AMD was able to render twice the amount of hair strands per second, making for a much more fluid experience.
Just like we saw with the lead up to the Polaris GPU launch, AMD seems to be releasing a steady stream of information on Vega. Now that we have the official branding for Vega, we eagerly await getting our hands on these new High-end GPUs from AMD.
nice
nice
The piecemeal manner in which
The piecemeal manner in which these “details” are being released is becoming absurd
BS rumors/conjecture and demos in a controlled environment.
Time machine to May/June 2017 please….
Put your mind at ease go buy
Put your mind at ease go buy Nvidia.
Until they release the specs
Until they release the specs and the Vega is only 10% slower than the gtx 1080 but $200 cheaper, then youd wonder why you didn’t wait the 2 months and could’ve saved 2 benjamins.
Hoping that a similar
Hoping that a similar situation to what we’re seeing now with Zen and Intel, takes place with Vega and Nvidia.
The waiting game is growing bothersome however, and I was never one with an affinity for patience. aargh
Whoa, a post out of nowhere
Whoa, a post out of nowhere from the younger healthier version of Ryan
That live event was so lame.
That live event was so lame. No real information, just hype for the stock brokers.
“Rapid Pack Math is an
“Rapid Pack Math is an implementation of something we have been hearing and theorizing a lot about lately, the use of FP16 shaders for some graphic effects in games. By using half-precision FP16 shaders instead of the current standard FP32 shaders, developers are able to get more performance out of the same GPU cores. In specific, Rapid Pack Math allows developers to run half-precision FP16 shaders at exactly 2X the speed of traditional standard-precision FP32 shaders.”
Not exactly Rapid Pack Math uses 2 FP16 values packed into one 32 bit shader core and that is what is done for more half-precision FP16 shaders available for more FP16 computations for effects that make use of FP16. As this Anandtech article states about Vega’s packed FP16 math:
“With their latest architecture, AMD is now able to handle a pair of FP16 operations inside a single FP32 ALU. This is similar to what NVIDIA has done with their high-end Pascal GP100 GPU (and Tegra X1 SoC), which allows for potentially massive improvements in FP16 throughput. If a pair of instructions are compatible – and by compatible, vendors usually mean instruction-type identical – then those instructions can be packed together on a single FP32 ALU, increasing the number of lower-precision operations that can be performed in a single clock cycle. This is an extension of AMD’s FP16 support in GCN 1.2 & GCN 4, where the company supported FP16 data types for the memory/register space savings, but FP16 operations themselves were processed no faster than FP32 operations.”(1)
(1) “The AMD Vega GPU Architecture Teaser: Higher IPC, Tiling, & More, Coming in H1’2017”
[see article page 2 under the sub heading: “Vega’s NCU: Packed Math, Higher IPC, & Higher Clocks”]
http://www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser
Edit: for more
Edit: for more half-precision FP16 shaders available
to: for more half-precision FP16 operations available
It’s actually 2 FP16 values done inside of one 32 bit shader without needing any extra dedicated FP16 only shaders. Under the Pre-Vega GPU micro-arch FP16 operations were done/could be done in the lower half of a 32bit shader core but it did not use the upper half of the FP32 shader for any FP16 wokloads resulting in less FP16 efficency(Half of the FP32 shader unused for any FP16 math).
“By turning HBCC on, they
“By turning HBCC on, they were able to show a 50% increase in average FPS, and a 100% increase in minimum FPS.”
Am I in heaven?
No, not in heaven. They were
No, not in heaven. They were devil tricks. AMD realized that selling us graphics cards with large amount of VRAM makes them future proof at their expense.
Remember those are percentage
Remember those are percentage increases, and those don’t always scale with absolute baseline values. Without the absolute values, it does not tell us much about effective performance. For example: increasing from a minimum 5FPS to 10FPS is a 100% increase, but that still remains pretty awful, but if in a more realistic scenario it’s an increase from 55FPS to 60FPS then that’s not as dramatic.
Do you have the habit of
Do you have the habit of running GPUs with parts or most of their on-board memory removed? If so, this might be for you!
Others may just have experienced violent flashbacks to i740 graphics and accompanying symptoms of PTSD.
It will be nice to have for
It will be nice to have for mobile GPUs which have less Video Memory available. And all AMD is doing is using a faster DRAM memory(HBM/Other memory) as a last level of Cache above a lower level DRAM or system DRAM.
So just like any processor’s L1, L2, L3(If Used at all) cache memory levels make use of some faster memory Cache/s above a regular level of slower/larger system memory, this HBC allows a smaller level of faster/cache memory to hide any latencies of the larger slower level of DRAM/System memory from the processor(CPU, GPU Other processor).
If the caching algorithms used by AMD’s Vega’s HBCC can allow the GPU to feed mostly from HBM2 and not slower/lower system DRAM then there will be performance improvements to be had with any size Video memory! As far as texture swapping from slower DRAM/system memory to the faster memory/Cache memory in the background goes, caching algorithms and cache memory improve performance with marginal returns as the cache memory size increases beyond a certain point. It’s just that the larger the cache the better chance that the needed data will reside in the cache instead of the slower DRAM/System memory.
The HBCC, or any cache controller, can if the caching algorithms have/allow for a high cache hit rate(99%) can definitely hide the majority of any latency/bandwidth issues of a much larger size of slower DRAM/System memory.
Any GPU/other processor can have plenty of work staged up in the processor’s L2 cache and L1 caches above to keep the GPU’s shaders working while the processor’s cache/memory controller works outstanding memory requests in the background to get any needed data out of the slower levels of storage or system DRAM up to the faster cache memory levels that the processor works from the majority of the time! And this can be done without the processor ever having to work directly from any slower system storage or system DRAM. This is classic cache level latency hiding and it’s why cache memory is used in the first place on processors of any type to improve performance and processor execution resources utilization rates.
Isn’t the mobile market the
Isn’t the mobile market the exact opposite with gems like the 4GB GTX 950m?
Frankly, this looks like something Fiji may have benefited from because of the use of HBM and its size limitations, but unless you’re actively trying to screw over customers (and increase your margins) by selling GPUs with marginal on-board memory, I find it hard to zero in on the target market in the age of GDDR5(X) and HBM2 in the consumer space.
Terrific post. You verbalised
Terrific post. You verbalised a difficult matter very well.
I have struggled with putting that essence – that you cant just look at the speed and lag of component cache resources in the cache pool – hbcc done right, ensures processors are ~always interconnecting with high level cache.
Ken writing for PCPer o/
Ken writing for PCPer o/
Provided the grant you the
Provided the grant you the ability to limit which high speed storage devices it uses for writes, this isn’t a bad thing.
I would *not* want it writing to an NVMe SSD, if I played a lot of games, and wanted to see the SSD last 10 years.
Now you will want 32+GB for
Now you will want 32+GB for ram, glad i am already there.
It’s clear that amd will not
It’s clear that amd will not release a high end card to compete with Pascal, when Vega is released it will be a year late with Volta waiting in the wings. Vega will need to be very very or very cheap or amd will fall comically far behind Nvidia in market share.
If you think prices are high now, how bad will they be if Vega is relegated to the mid-to-low range segment when Volta launches?
You know, I like their naming
You know, I like their naming it Vega.
So often, manufacturers have fantastic code names for particular products (not just talking about the tech industry here). And then they come out with the product announcement proper and they’ve called it some boring old crap name, like XYZ123.
It’s refreshing for someone to keep the code name and use it for their product.
Looks good, but too late.
Looks good, but too late.
Not sure about this HBCC
Not sure about this HBCC thing though. Seems like an excuse to put less memory on the GPU card and rape the rest of my system for the rest.
They can get knotted, my RAM is MY RAM and I’M using that. AMD can go put a proper amount of memory on their GPUs and use it for graphics like they’re supposed to.