High Bandwidth Cache
AMD has released a small bit of information about its Vega GPU, in particular about the memory system, primitive shaders and tile renderer.
Apart from AMD’s other new architecture due out in 2017, its Zen CPU design, there is no other product that has had as much build up and excitement surrounding it than its Vega GPU architecture. After the world learned that Polaris would be a mainstream-only design that was released as the Radeon RX 480, the focus for enthusiasts came straight to Vega. It’s been on the public facing roadmaps for years and signifies the company’s return to the world of high end GPUs, something they have been missing since the release of the Fury X in mid-2015.
Let’s be clear: today does not mark the release of the Vega GPU or products based on Vega. In reality, we don’t even know enough to make highly educated guesses about the performance without more details on the specific implementations. That being said, the information released by AMD today is interesting and shows that Vega will be much more than simply an increase in shader count over Polaris. It reminds me a lot of the build to the Fiji GPU release, when the information and speculation about how HBM would affect power consumption, form factor and performance flourished. What we can hope for, and what AMD’s goal needs to be, is a cleaner and more consistent product release than how the Fury X turned out.
The Design Goals
AMD began its discussion about Vega last month by talking about the changes in the world of GPUs and how the data sets and workloads have evolved over the last decade. No longer are GPUs only worried about games, but instead they must address profession workloads, enterprise workloads, scientific workloads. Even more interestingly, as we have discussed the gap in CPU performance vs CPU memory bandwidth and the growing gap between them, AMD posits that the gap between memory capacity and GPU performance is a significant hurdle and limiter to performance and expansion. Game installs, professional graphics sets, and compute data sets continue to skyrocket. Game installs now are regularly over 50GB but compute workloads can exceed petabytes. Even as we saw GPU memory capacities increase from Megabytes to Gigabytes, reaching as high as 12GB in high end consumer products, AMD thinks there should be more.
Coming from a company that chose to release a high-end product limited to 4GB of memory in 2015, it’s a noteworthy statement.
The High Bandwidth Cache
Bold enough to claim a direct nomenclature change, Vega 10 will feature a HBM2 based high bandwidth cache (HBC) along with a new memory hierarchy to call it into play. This HBC will be a collection of memory on the GPU package just like we saw on Fiji with the first HBM implementation and will be measured in gigabytes. Why the move to calling it a cache will be covered below. (But can’t we call get behind the removal of the term “frame buffer”?) Interestingly, this HBC doesn’t have to be HBM2 and in fact I was told that you could expect to see other memory systems on lower cost products going forward; cards that integrate this new memory topology with GDDR5X or some equivalent seem assured.
Think of the HBM2 and the cache itself as the as the current active working set for a much larger addressable space of memory.
HBM2 (high bandwidth memory 2) has more than twice the bandwidth per pin compared to the first generation with 8x the capacity per stack. This means that even as the HBM2 implementation continues to offer the significant footprint shrink advantages of GDDR memory, AMD will no longer be forced to sacrifice capacity compared to other memory technologies. Seeing cards with 16GB of HBM2 HBC (again, high bandwidth cache) seems to be the most likely configuration on day one.
To manage the new memory hierarchy AMD has built an HBC controller. Fundamental changes had to be made to how the GPU handles data flow, scheduling and directly impacted the architecture of the chip and data paths through it. The controller, likely a part of the silicon itself, is responsible for talking to the cache and any other memory systems available to it. AMD is still being very vague about what these other options will be and different cards built for different markets will likely have different configurations. In the diagram examples AMD lists NVRAM (flash essentially), network storage and primary system memory. Of those three, all of them have very different latency, capacity and bandwidth characteristics that could be balanced to provide the best possible experience for a particular workload.
With a total addressable memory space of 512TB in this new system a 49-bit address space, it is similar to the x86-64 address space of 48-bit (256TB). That leaves a lot of room for growth on the GPU memory area, even when you start to get into massive network storage configurations.
If you think back to what AMD announced in the middle of last year with the SSG product line, it was a graphics card with an SSD on-board, increasing the memory capacity of the platform. At the time the implementation was crude, requiring the applications to access the SSD storage through the Windows storage layers despite being on-board with the GPU itself. With Vega 10 this won’t be necessary – the HBC itself will can communicate with flash memory through PCIe, according to one conversation I’ve had with NVMe and a full x16 lanes of PCI Express 3.0. (There is lots to debate along this: will AMD have its own SSD controller as part of the HBC or will they use a third party? If they go their own route, what expertise do they have to outperform the current NVMe options on the market that are limited to PCIe 3.0 x4? What if AMD utilized something like Intel 3D XPoint / Optane?) A high-end consumer Vega 10 GPU with 8-16GB of HBM2 and a 128GB SSD on it opens a totally new set of doors for how games and applications can be written and optimized for.
The controller is built to have very fine grained data movement capabilities, allowing it to move memory between the different layers (cache and other storage options) in small chunks, improving the efficiency of the movement. Though AMD isn’t talking about performance advantages they did show a couple of examples of current generation games that allocate a lot of memory on high end graphics cards (8GB+) but rarely access more than half of that through any reasonable portion of time. The implication is that with a HBC and controller managing this memory through hardware dynamically, it could effectively offer a perceived infinite memory allocation area and handle movement from the SSD and HBC behind the scenes.
This might sound familiar to people following the HSA (Heterogeneous Systems Architecture) goals to create a truly unified address spaces between all memories and all processors in a computer. AMD said though that we still can’t get to that as legacy software simply isn’t built with that in mind.
“With Vega GPU architecture
“With Vega GPU architecture AMD is aiming to reinvent and the geometry pipeline.”
I hate to be the grammar police but you added an and.
All things eventually have to
All things eventually have to come to an end.
Where is the Nvidia banner?
Where is the Nvidia banner? They should be haooy to pay all your expenses for you to make an AMD news… When you sold your soul to the devil… Don’t worry, you will get your free 1080 Ti…. How can you even accept this in the first place? Does the word “independent” mean anything to you?
here you go!
here you go!
PC Perspective's CES 2017 coverage is sponsored by NVIDIA.
Follow all of our coverage of the show at https://pcper.com/ces!
Nice analysis, thank you
Nice analysis, thank you =)
Vega is looking pretty hot!
Great coverage, what people
Great coverage, what people dont seem to realize is that it is silly to have expected a card launch this early, it would have given Nvidia more time to respond. This IS news Architectural improvements look great im not a tech expert but it looks like AMD put alot of effort improving their architecture and that will benefit in the future as they try to future proof their cards
It seems like this is the
It seems like this is the natural progression from middleware texture streaming technology like GRANITE from Graphine Software which allows streaming very high quality textures into the GPU from various storages using highly optimised algorithms for those storage backends and for figuring out what is visible on the screen and what the user is mostly looking at. What it primarily builds up to, is better VR quality and experience in the future. I would expect the end of 2017 and the start of 2018 to be the coming boom of the VR/AR industry.
All of this
All of this reminds me of what nVidia did to their 10 Series GTX cards. Except AMD has added features that nVidia hasn’t, yet.
Interesting none the less 🙂
Gddr5x vs hbm2 .One is VRAM
Gddr5x vs hbm2 .One is VRAM the other is classic desktop ram but stacked. Temperature on hbm2 is Gona be a bitch . Gddr5x I suspect there won’t be any issue . On paper it should be a bomb but any ever researched hbm . They should . I ll give you an exemple . I have 6 and a 8 pin on my GPU I had to cut a wire on the 6 pin(as per the standard. It’s all well and good to strut but if your allies don’t follow the standard it’s all for nothing .You get over heating issue and you search till you find that the maker didn’t follow the standard . So I ll wait before I cheer . AMD has had ton of issue in the past to respect standard and make their friend maker respect the standard
I’m just wondering with all
I’m just wondering with all this memory stuff – how long til developers are coding for it, or will the drivers have to handle all this overhead?
Reminds me of FX chips. Those could have been great if the industry started coding for raw cores. Instead they kept to the known way & AMD was left on the roadside with new tech that no one was fully utilizing.
Sadly, I keep seeing a future where AMD is gone as a company before their tech is being utilized to it’s fullest.
Or am I wrong on all this?
The high bandwidth cache
The high bandwidth cache controller should not require developers to explicitly code for it at all. It should be completely transparent.
Belatedly, a good article
Belatedly, a good article ryan.
I agree the local ssd raid0 as ~unlimited vram is exciting and worth dwelling on. Few others have.
If as u say, 16 pcie3~ lanes are available for the interconnect w/ the gpu, and given even now, fairly new tech single ssdS push the boundaries of 4 lanes – w/ speeds of ~3.5GBps, then incomprehensibly fast raid storage, is possible as a vast, virtual vram memory pool.
Its slower, but avoids the fetters of the pc bus via its direct link to the gpu.
Its a new deal for coders.