Tech ARP had a chance talk with AMD's Jeffrey Cheng about the new Vega GPU memory architecture. He provided some interesting details such as the fact that the new architecture can handle up to 512 TB of addressable memory. With such a large pool it would be possible to store data sets in HBM2 memory to be passed to the GPU as opposed to sitting in general system memory. Utilizing the memory present on the GPU could also reduce costs and energy consumption, not to mention the fact it will perform far more quickly. Pop by to watch the video to see how he feels this could change the way games and software could be programmed.
"Want to learn more about the AMD Vega memory architecture? Join our Q&A session with AMD Senior Fellow Jeffrey Cheng at the AMD Tech Summit!"
Here is some more Tech News from around the web:
- Mozilla kills Firefox OS as it backs away from IoT ambition @ The Inquirer
- Microsoft tells OEMs that the secret to Windows 10 success is to be more 'cool' @ The Register
- Microsoft Introduces GVFS (Git Virtual File System) @ Slashdot
- 'Webroot made my PCs s*** the bed' – AV update borks biz machines hard @ The Register
- Ubiquiti Amplifi HD Mesh Wi-Fi Router System @ Custom PC Review
I see the real reason for
I see the real reason for this not so much in graphics cards, but in consumer and server APUs. If Raven Ridge has 2GB of HBM2 but that’s automatically treated as a cache, it’s the first no-compromise desktop APU.
And on the server side, leaks suggest we’ll be seeing something like 16 CPU cores coupled with 2048 shaders and 8GB of HBM2 in Snowy Owl. If programmers can just throw HPC workloads at it without worrying about memory management, it could have a bright future. Memory management is the hardest thing to get right in that kind of work.
That sounds wonderful!
That sounds wonderful! Sources?
Here some on AMD’s exascale
Here some on AMD’s exascale APU system!
“AMD’s Exascale Strategy Hinges on Heterogeneity”
https://www.hpcwire.com/2015/07/29/amds-exascale-strategy-hinges-on-heterogeneity/
Here is a quote from top
Here is a quote from top 500(Supercomputers).
“A subsequent IEEE report authored by the AMD’s FastForward principle investigators and others at the company, described the research in some detail. As described, the EHP would combine a CPU and GPU, the latter providing most of the FLOPS. Each APU would provide at least 10 peak teraflops of compute, which would require a system with 100,000 such processors to reach a peak exaflop.”(1)
(1)
“Pondering AMD’s Ambitions for High-Performance APUs”
https://www.top500.org/news/pondering-amds-ambitions-for-high-performance-apus/
Here is an article about FPGA
Here is an article about FPGA compute on the HBM/HBM2 die stacks, patent filing from AMD.
“AMD patent filing hints at FPGA plans in the pipeline”
http://www.theregister.co.uk/2015/08/11/amd_patent_filing_hints_at_fpga_plans_in_the_pipeline/
16 cores are more than likely
16 cores are more than likely a middle range workstation APU/Interposer SKU with only 2048 shaders! There will HPC APUs with 32 cores and even larger numbers of Vega NCUs/shaders for some future HPC systems, including AMD versions with with some FPGA compute on the HBM2 stacks for some in HBM2 memory compute for even exascale systems. AMD has already had a patent application for placing FPGA compute on the HBM2 stacks for in memory compute assist and AMD also has an exascale grant proposal that shows a 32 Zen core exascale APU with loads of HBM2 memory with on HBM2 FPGA compute and a large Vega die in addition to plenty of off interposer channels to regular DIMM based DRAM.
That AMD Infinity Fabric will be used on CPUs as well as GPUs and the HPC/workstation/workstation grade APUs on an interposer are in development.
That High Bandwidth Cache Controller(HBCC) IP will allow AMD to use HBM2 as if it where another cache layer! So for laptop based Ryzen/Vega APU SKUs any laptop OEM only offering a single channel to off interposer memory to a larger pool of DDR4 DRAM will not have a detrimental effect on any APU’s integrated graphics should the APU come with even a single stack of HBM2. I’d expect that all of AMD’s laptop/Interposer based APUs to really perform well with just a single stack of HBM2(4 GB or 8 GB) that can leverage a much larger pool of regular lower bandwidth DDR4 DIMM based DRAM(Single or Dual channel) and keep the Integrated GPU running from HBM2.
Also there is nothing stopping AMD/JEDEC from amending the JEDEC HBM/HBM2 standards to allow a single stack of HBM2 to have a 2048 bit wide interface instead of a 1024 bit interface as in the current HBM2 standard. So AMD/HBM2 partner could get a single stack of HBM2 with a lot higher effective bandwidth without having to increase the HBM2’s memory clocks.
The current JEDEC HBM2 standard only describes what is needed to interface with a single stack of HBM2 with its 1024 bit wide interface but there is plenty of room in the JEDEC standard for amending things and there is no hard limit to having more than 4 HBM2 stacks on a processor/interposer package provided the Interposer can be made large enough. I’d say that maybe it would be better to go with wider 2048 bit interface HBM2-A standard per stack on some future HBM# standard and just go higher with the stacks so the per HBM# stack clock speeds can be kept as low as possible to save on power, generate less heat, and save interposer space.
HBM2 isn’t ideal for use with
HBM2 isn’t ideal for use with an APU.
What you’re talking about is a flat, shared memory architecture the same as a PS4 (just HBM2 instead of GDDR5).
The PROBLEM is that your GPU performance will be limited due to die size so the COST and expansion vs performance doesn’t make sense.
AMD discussed this already.
*I think where your logic breaks down is that you still need a large cache (normal VRAM + system memory) to handle software that doesn’t use this new architecture so you’ll still need at least 16GB for a higher-end gaming PC.
That’s also the PROBLEM with this new architecture. When will we see a game that takes ADVANTAGE of this?
A modern game needs to support DX11, DX12, PC desktop architecture, and console shared memory architecture and they won’t spend extra money to optimize for this unless they think there is a financial benefit.
It MIGHT make more sense to just find the MIDDLE GROUND and better optimize swapping between VRAM and System memory for desktop PC.
The above should read “.. in
The above should read “.. in a desktop APU” as I’m not talking about servers.
You really think AMD spent 7
You really think AMD spent 7 years developing HBM memory to not use it on their APU’s?
Really HBM/HBM2 would be
Really HBM/HBM2 would be great for an APU, even 4GB of HBM2, and you DO NOT need 16GB of HBM2 on most APUs you just need enough HBM2 to keep the the integrated graphics from having to run from regular DIMM based DRAM. So the video explains that with the HBCC(High Bandwidth Cache controller) most of the texture data can reside on regular DIMM based DRAM or be paged to NVM storage with the HBCC able to in the background make sure that only the textures that are needed directly by the integrated APU graphics need to reside on HBM2. So a small amount of HBM2 if treated like high bandwidth Cache can make a large pool of slower system DRAM effectively faster the very same way that the L1/l2/L3 cache makes slower system DRAM effectively faster for a CPU.
Do you think that any modern CPU/GPU runs from any memory other than mostly Cache memory with any other memory accesses handled by the Cache/Memory subsystems in the background. CPU(Cores) and GPU(Shaders) run mostly from low latency instruction and data cache and not directly from memory on the PC’s DRAM(GDDR or other memory). So most CPUs execute directly from L1 instruction and data cache with L2 cache there for staging or evicting instructions and data to/from the L1 instruction and data caches. If a CPU does not have the data available in the L1 Instruction or Data Caches then it will work on other threads while its cache memory subsystem performs a table look up to see if the instructions/data are in the processor’s L2 cache or L3 cache before the memory controller even attempts to fetch data from main memory.
GPUs have way more shaders(Cores) but they all operate in a similar manner with their ranks and files of L1 instruction and data caches that spill over onto/pull from large banks of L2 shader cache. AMD’s HBCC will probably treat HBM2 like maybe an L3 cache depending on the levels of GPU cache above, and a small amount of HBM2 treated as L3 cache can leverage a much larger pool of slower DIMM based DRAM such that the GPU will mostly work from the HBM2 and the cache levels above with any accesses to regular DDR4 based DRAM done in the background in order to keep the most needed textures staged on HBM2 with its massive amounts of bandwidth and wide 1024 bit wide connection. The JEDEC HBM/HBM2 standard splits that 1024 bits per HBM2 stack in to smaller independent channels to service the many memory access queues on a GPU, or CPU(APU based).
The New Vega HBCC IP for controlling a GPU’s own virtual memory pool will allow for any GPU or APU to have a small 4GB or 8GB pool of HBM2 memory for textures that are about to be immediately needed by the levels of cache above on the GPU while allowing the regular DIMM based DRAM/system memory(for an APU) to hold textures that my not be needed right away. And with the Vega IP for managing a large pool(Up to 512TB) of virtual memory the textures in the DIMM based system memory address space may not even be in physical RAM, the textures may very well be paged to SSD/NVM or disk if that texture memory has not been recently accessed by the GPU.
With Vega a game could load all of its textures into memory even above the the total amount of physical memory of HBM2 and the PCs system RAM with the HBCC and OS responsible for managing that virtual memory page swap to SSD/Hard drive. So the games developer will only have to worry about managing the game while the HBCC and OS/other systems manage what texture/code is staged where it needs to be to keep the GPU’s hardware fully utilized.
APUs have had virtual memory capabilities all along owing to the nature of an APU. But with Vega’s HBCC IP even a discrete GPU can manage its own SSD/Hard drive virtual memory swap space on its own PCIe card based SSD/NVM or even use the PC’s system memory address space for RAM/virtual memory management if the discrete GPU does not have its own dedicated NVM storage pool.
“”*I think where your logic breaks down is that you still need a large cache (normal VRAM + system memory) to handle software that doesn’t use this new architecture so you’ll still need at least 16GB for a higher-end gaming PC.”
NO! CPU’s with their virtual memory abilities have no problems running legacy code, why would a VEGA GPU with its HBCC or HBM2 treated as Cache be any different as that management is not seen by the code it happens outside the control of any game or application for CPUs and GPUs alike! It’s all hidden by a hardware abstraction layer for CPUs and GPUs that only the OS/API or GPUs system firmware would see or manage. A block of code or data has no idea where it may reside in any CPU’s or GPU’s many levels of Cache or system RAM or even in a virtual memory page file on a SSD/Hard drive, that’s all managed at the ring 0 OS level and in the processors hardware(Cache/Memory subsystems). Programmers do not manage that the processor and OS or firmware(For discrete GPUs) manage memory/virtual memory.
Jeffrey Cheng also mentioned
Jeffrey Cheng also mentioned OpenCAPI in the video, so AMD’s future GPUs will interface with any Power9 based systems(IBM and third party power9 licensees systems) that will be using OpenCAPI!
IBM created CAPI(Coherent Accelerator Processor Interface) and now there is OpenCAPI and AMD is along with IBM/Others a founding member of OpenCAPI. So there will be OpenCAPI capabilities for AMD’s GPUs to interface with systems that support OpenCAPI, PCIe, AMD Infinity Fabric, etc. Google(OpenPower/Power9 Licensee) is going to be using Power9’s as are others so this Vega IP and OpenCAPI IP will probably be able to get AMD more sales in the non x86 HPC/Server markets.
Can the Operating system make
Can the Operating system make decisions as to where to put data?
Or is the HBC controller fully autonomous?
(as a comparison to say a fusion/hybrid hard drive with SSD – the OS can work or what it wants kept on the faster SSD part and when to move data to and from that section).
The fusion/hybrid hard drive
The fusion/hybrid hard drive with SSD/NAND will have its controller place the data in Cache(internal RAM and/or SLC NAND or MLC NAND) but the OS is only going to request/command the device’s controller to read/write sectors according the OS’s file system rules. The drive’s controller will receive the track/sector read/write request from the OS and the drive’s controller will usually write that data to its RAM Cache or SLC/MLC Cache to maintain the read/write queued requests efficiency.
In the background the drive’s Controller will manage its own Cache levels for the fusion/hybrid management for any hard drive related cache(RAM and NAND based) to Disk physical track sector read/writing. So the OS will simply have its normal read/write to track/sector requests cached to the SSD portion(Usually the SSD portion will have some regular RAM to buffer the reads/writes more efficiently before the data is cached to the NAND) but eventually in the background the drive’s controller will work the data from SSD cache to the physical track/sector on the hard drive portion(depending on the size of the NAND portion the drives controller will keep the most often requested data read/writes cached on the SSD portion and only transfer stale data to the disk portion).
An embedded CPU with OS is usually used as a controller for most drives of any type so depending on the code and the embedded OS on the drive can be programmed to do many background tasks as well as its priority task of servicing the OS’s read/write requests. All that hybrid drive cache management is done under the control of the embedded OS/firmware so the OS only will see the its read/write requests being serviced. There can be some functionality in the devices driver/firmware to allow the OS to be aware of other things like where the data is actually stored so there can be some functionality to request data be stored in the cache area, but usually the drives internal embedded CPU/OS/Firmware does a better job of managing where the hot data is staged on the drive.
That HBCC and the embedded firmware on the GPU’s PCIe card will be able to act like the OS(Actually is an embedded OS) for a discrete GPU. So the GPU can issue its own read/write requests to a integrated PCIe card NVM drive. The GPU’s HBCC/memory controller on a discrete GPU can probably also make read/write requests to the OS and even have the OS manage virtual memory requests for the GPU. GPU’s are processors so I’d imagine that they could have the same virtual memory management hardware that a CPU has with the GPU able to manage its own drives and virtual memory page tables and/or use the main systems OS’s page tables management system.
Pascal has a 49bit addressing
Pascal has a 49bit addressing too, btw
Good for old Blaise, I’ll bet
Good for old Blaise, I’ll bet he could also figure out the probability of Nvidia’s customers being overcharged for GPU hardware! He did lay the foundation for the modern theory of probabilities, didn’t he!
I have plenty to complain
I have plenty to complain about with respect to capitalism. But Nvidia is free to price how they like. As long as people keep paying, it’s fine.
I’m not a cutting edge gamer. The RX 480 4GB hit the right price/performance target for me. As cool as Vega or an Nvidia GTX 1080 might be, I don’t plan to upgrade until ~ 2021.
Why not wait for Vega and the
Why not wait for Vega and the mad sales pricing/savings that the RX 480 will receive and go dual RX 480s. Once the gaming industry gets its Vulkan/DX12 explicit GPU multi-adaptor chops in order and then CF/SLI will be history with the gaming engines/gaming engine SDKs allowing the games makers to use any and all GPUs plugged into a gaming PC.
I’m waiting for Vega’s release to cause the RX 480’s pricing to go down even fruther so I can get a Ryzen/AM4 system with Dual RX 480’s and save a bunch on graphics, motherboard and CPU.
what with hdmi 2.1 ?
what with hdmi 2.1 ?