High Bandwidth Cache
AMD has released a small bit of information about its Vega GPU, in particular about the memory system, primitive shaders and tile renderer.
Apart from AMD’s other new architecture due out in 2017, its Zen CPU design, there is no other product that has had as much build up and excitement surrounding it than its Vega GPU architecture. After the world learned that Polaris would be a mainstream-only design that was released as the Radeon RX 480, the focus for enthusiasts came straight to Vega. It’s been on the public facing roadmaps for years and signifies the company’s return to the world of high end GPUs, something they have been missing since the release of the Fury X in mid-2015.
Let’s be clear: today does not mark the release of the Vega GPU or products based on Vega. In reality, we don’t even know enough to make highly educated guesses about the performance without more details on the specific implementations. That being said, the information released by AMD today is interesting and shows that Vega will be much more than simply an increase in shader count over Polaris. It reminds me a lot of the build to the Fiji GPU release, when the information and speculation about how HBM would affect power consumption, form factor and performance flourished. What we can hope for, and what AMD’s goal needs to be, is a cleaner and more consistent product release than how the Fury X turned out.
The Design Goals
AMD began its discussion about Vega last month by talking about the changes in the world of GPUs and how the data sets and workloads have evolved over the last decade. No longer are GPUs only worried about games, but instead they must address profession workloads, enterprise workloads, scientific workloads. Even more interestingly, as we have discussed the gap in CPU performance vs CPU memory bandwidth and the growing gap between them, AMD posits that the gap between memory capacity and GPU performance is a significant hurdle and limiter to performance and expansion. Game installs, professional graphics sets, and compute data sets continue to skyrocket. Game installs now are regularly over 50GB but compute workloads can exceed petabytes. Even as we saw GPU memory capacities increase from Megabytes to Gigabytes, reaching as high as 12GB in high end consumer products, AMD thinks there should be more.
Coming from a company that chose to release a high-end product limited to 4GB of memory in 2015, it’s a noteworthy statement.
The High Bandwidth Cache
Bold enough to claim a direct nomenclature change, Vega 10 will feature a HBM2 based high bandwidth cache (HBC) along with a new memory hierarchy to call it into play. This HBC will be a collection of memory on the GPU package just like we saw on Fiji with the first HBM implementation and will be measured in gigabytes. Why the move to calling it a cache will be covered below. (But can’t we call get behind the removal of the term “frame buffer”?) Interestingly, this HBC doesn’t have to be HBM2 and in fact I was told that you could expect to see other memory systems on lower cost products going forward; cards that integrate this new memory topology with GDDR5X or some equivalent seem assured.
Think of the HBM2 and the cache itself as the as the current active working set for a much larger addressable space of memory.
HBM2 (high bandwidth memory 2) has more than twice the bandwidth per pin compared to the first generation with 8x the capacity per stack. This means that even as the HBM2 implementation continues to offer the significant footprint shrink advantages of GDDR memory, AMD will no longer be forced to sacrifice capacity compared to other memory technologies. Seeing cards with 16GB of HBM2 HBC (again, high bandwidth cache) seems to be the most likely configuration on day one.
To manage the new memory hierarchy AMD has built an HBC controller. Fundamental changes had to be made to how the GPU handles data flow, scheduling and directly impacted the architecture of the chip and data paths through it. The controller, likely a part of the silicon itself, is responsible for talking to the cache and any other memory systems available to it. AMD is still being very vague about what these other options will be and different cards built for different markets will likely have different configurations. In the diagram examples AMD lists NVRAM (flash essentially), network storage and primary system memory. Of those three, all of them have very different latency, capacity and bandwidth characteristics that could be balanced to provide the best possible experience for a particular workload.
With a total addressable memory space of 512TB in this new system a 49-bit address space, it is similar to the x86-64 address space of 48-bit (256TB). That leaves a lot of room for growth on the GPU memory area, even when you start to get into massive network storage configurations.
If you think back to what AMD announced in the middle of last year with the SSG product line, it was a graphics card with an SSD on-board, increasing the memory capacity of the platform. At the time the implementation was crude, requiring the applications to access the SSD storage through the Windows storage layers despite being on-board with the GPU itself. With Vega 10 this won’t be necessary – the HBC itself will can communicate with flash memory through PCIe, according to one conversation I’ve had with NVMe and a full x16 lanes of PCI Express 3.0. (There is lots to debate along this: will AMD have its own SSD controller as part of the HBC or will they use a third party? If they go their own route, what expertise do they have to outperform the current NVMe options on the market that are limited to PCIe 3.0 x4? What if AMD utilized something like Intel 3D XPoint / Optane?) A high-end consumer Vega 10 GPU with 8-16GB of HBM2 and a 128GB SSD on it opens a totally new set of doors for how games and applications can be written and optimized for.
The controller is built to have very fine grained data movement capabilities, allowing it to move memory between the different layers (cache and other storage options) in small chunks, improving the efficiency of the movement. Though AMD isn’t talking about performance advantages they did show a couple of examples of current generation games that allocate a lot of memory on high end graphics cards (8GB+) but rarely access more than half of that through any reasonable portion of time. The implication is that with a HBC and controller managing this memory through hardware dynamically, it could effectively offer a perceived infinite memory allocation area and handle movement from the SSD and HBC behind the scenes.
This might sound familiar to people following the HSA (Heterogeneous Systems Architecture) goals to create a truly unified address spaces between all memories and all processors in a computer. AMD said though that we still can’t get to that as legacy software simply isn’t built with that in mind.
First. It’s amazing how much
First. It’s amazing how much more efficient they made this.
Waiting for more info thanks
Waiting for more info thanks pcper for the update!
So all we get is all
So all we get is all buzzwords and no actual details on any products? So for all we know, VEGA could be several months away? And we still have no real clue on where the performance is going to be. Will we se something that could push NVIDIA and the Titan XP or will it simply be a competitor to GTX 1080 hopefully with lower prices. Who knows??
CES sure has been a huge letdown when it comes to juicy hardware…. And nothing on Zen/Ryzen?
>I’ve waited for almost three
>I’ve waited for almost three years..for THIS…
FFFFFFFFFFFFFFFFFFFFFUUUUUUUUUUUUUUUUUUUUUUUUUUUUU~
They launched Polaris at CES,
They launched Polaris at CES, and gamers didn’t get cards for sale until JUNE.
Waiting for AMD fanboys to be amazed when Vega doesn’t ship each month until then.
“The potential for this kind
“The potential for this kind of memory system is substantial though I would wager the impact on enthusiast gaming will be minimal out of the gate and for a couple of generations”
Yes. All the jigglebits in the verse cannot save you from the nearly non-existent memory utilization in games.
So AMD, you had the chance,
So AMD, you had the chance, the opportunity to make people believers on you again (especially after New Horizon event with Ryzen cpu) and you gone a fucked it up. What was the point of having a countdown, getting people hyped up for a GPU then nothing, nada. 6 short videos about the products but no actual product.
Another reason for Nvidia to hike their prices up again. Thanks AMD.
you don’t have to buy a
you don’t have to buy a nvidia
All the hype and this?
Smells
All the hype and this?
Smells like last minute decision to respin the silicon.
AMD should focus on launching
AMD should focus on launching products rather than wasting time talking about them. Zen is overdue and Vega won’t matter until there is a Zen platform to run it on. On the upside, Intel flopped once again with another lame product release.
Well this is a let down, I
Well this is a let down, I was sort of expecting a bit more from AMD to be honest, all we get are some videos that show a whole lot of nothing. Will Nvidia have Volta out before this or will the 1080 Ti (if it exists) be enough?
Nvidia refused to sponsor
Nvidia refused to sponsor this piece ?
Or it’s not part of our CES
Or it's not part of our CES coverage perhaps?
AMD always delivering on the
AMD always delivering on the HYPE but not much else!
Is that why even now my Fury
Is that why even now my Fury X out performs a GTX 1070 in Battlefield 1 because Nvidia cant make a proper DirectX 12 card that doesn’t take a performance dump DX12 is used. Hype delivered IMO.
Its a shame they cost so
Its a shame they cost so much, i know they cost a ton to make and that’s the only place the fury x lost out really, if we could just click our fingers and turn all games from dx 11 to vulkan or dx12 #amd would be seriously clouting nvidiot
I wonder if any these memory
I wonder if any these memory advancements will result in better performance, Fiji’s HBM1 did nothing to improve performance. Volta will be waiting in the wings for this card, I hope amd’s big bet pays off.
Techpowerup appears to be
Techpowerup appears to be saying that “High Bandwidth Memory Cache isn’t the same as the HBM2 memory stacks.” (1)
But looking at the Vega DIE shots, I only see the Vega Die and 2 HBM2 die stacks, could this High Bandwidth Memory Cache actually be some eDRAM/other memory on the GPU’s Die that is managed by the High Bandwidth Cache Controller (HBCC)? Could the Cache Memory actually be etched into the interposer’s silicon itself and the interposer actually be of an active design(with Cache memory etched into it) instead of just a passive design with only traces etched into it(?).
Techpowerup is stating:
“It begins with a fast cache memory that sits at a level above the traditional L2 cache, one that is sufficiently large and has extremely low latency. This cache is a separate silicon die that sits on the interposer, the silicon substrate that connects the GPU die to the memory stacks. AMD is calling this the High Bandwidth Memory Cache (HBMC). The GPU’s conventional memory controllers won’t interface with this cache since a dedicated High Bandwidth Cache Controller (HBCC) on the main GPU die handles it. High Bandwidth Memory Cache isn’t the same as the HBM2 memory stacks. ” (1)
(1)
“AMD Radeon Vega GPU Architecture” [See page 2 of the article]
https://www.techpowerup.com/reviews/AMD/Radeon_Vega_GPU_Architecture/
I had expected Vega to
I had expected Vega to essentially be a larger design, but otherwise very similar to Polaris. I guess it is going to be a much more massive re-design. Not surprising that it isn’t available yet. It is unclear what the off package links are going to be available. It will be interesting to have X-point connected to such a device. The low latency and byte addressability could make it look like you have huge amounts of memory directly attached to the GPU for HPC. I don’t really know what the current state of these systems are. I know they were adding virtual memory type systems to GPUs quite a while ago to swap out to system memory, but I don’t know how much that is being utilized.
Intel’s brand of
Intel’s brand of XPoint(Optane) is nowhere near as fast as Intel’s marketing claimed. So not much more performance can be had currently relative to SLC NAND and proper latency hiding by a CPU/GPU processor’s cach/memory subsystems. I do see the need for maybe a GPU having some NVM made up of at least an SSD with 32GB of XPoint and the rest SLC NAND for that ON GPU/PCIe Card direct SSD drive Radeon SKU that AMD is making. Micron with have their own Quantx brand of XPoint so at least there will be competition to provide for and AMD’s XPoint needs.
Really I’d like to See JEDEC and AMD/Nvidia and their assoicated HBM2 memory partners trying to get an NVM/XPoint addition to the JEDEC HBM/HBM2 standard for an XPoint NVM die added to the HBM/HBM2 die stack and have some NVM/XPoint memory right there on the HBM2/newer HBM# stacks. That would be great for graphics and large textures stored in the On HBM# stack/s for gaming and other graphics workloads and even compute workloads. XPoint durability is going to have to be very high for it to be used on the HBM stacks and provide service for the life of the device so XPoint will have to be in use for a while until that question can be answered!
hi will him have full
hi will him have full bandwidth accessibility over thunderbolt 3 using x8 pic-e 3?
vega would easily handle 4k
vega would easily handle 4k open world gaming in monitor or VR with fast loading times
vega would easily handle 4k
vega would easily handle 4k open world gaming in monitor or VR with fast loading times
vega would easily handle 4k
vega would easily handle 4k open world gaming in monitor or VR with fast loading times.
I had really fast loading time on ES: Skyrim with 2 rx 390. thanks.
AMD is getting attacked by intel and nvidia. but they have generalists hand cause they make gpu apu and CPU so they can again innovate with what they did with AMD64, NEXT GEN consoles and Solid State Computing that is direct GPU CPU communication. of course they got to talk to motherboard manufacturers to make AM4 like that.
If an a big if the rumor of
If an a big if the rumor of Intel licensing AMD tech for their iGPU comes true then HSA will get a huge push.
That will never happen, Intel
That will never happen, Intel will not be getting any bleeding edge GPU IP from AMD! It will more than likely be Intel licensing the same OLD Very Basic GPU IP from AMD that Intel used to get from Nvidia, as AMD and Nvidia control options for some of the same types of basic GPU IP that Intel needs to keep licensing from either Nvidia or AMD to keep from getting sued!
There is a large pool of FRAND types of IP that both Nvidia and AMD both have the rights to that Intel needs to license in order to stay legal with Intel’s GPU designs! So Intel can get that from either Nvidia or AMD, But that DOES NOT include any of Nvidia’s or AMD’s bleeding edge IP of the last 5 or 10 years!
Ryan are you actually serious
Ryan are you actually serious with this? This article is full to the brim with horrendous grammatical errors, like this one:
“Why the move to calling it a cache will be covered below”
Or this word salad:
“Fundamental changes had to be made to how the GPU handles data flow, scheduling and directly impacted the architecture of the chip and data paths through it”
This is JUST from the first page. The second page gets worse. I actually found this so cringe-worthy that I could not continue reading it. Where is your editor?? This article needs some extensive corrections. You also have the same issue with your QC SD 835 article. I know that there is a big drive in the media to publish first, but an article in this state should never have been published.