… But Is the Timing Right?
What we’re waiting for could have been done for years… but wasn’t. Why?
Windows 10 is about to launch and, with it, DirectX 12. Apart from the massive increase in draw calls, Explicit Multiadapter, both Linked and Unlinked, has been the cause of a few pockets of excitement here and there. I am a bit concerned, though. People seem to find this a new, novel concept that gives game developers the tools that they've never had before. It really isn't. Depending on what you want to do with secondary GPUs, game developers could have used them for years. Years!
Before we talk about the cross-platform examples, we should talk about Mantle. It is the closest analog to DirectX 12 and Vulkan that we have. It served as the base specification for Vulkan that the Khronos Group modified with SPIR-V instead of HLSL and so forth. Some claim that it was also the foundation of DirectX 12, which would not surprise me given what I've seen online and in the SDK. Allow me to show you how the API works.
Mantle is an interface that mixes Graphics, Compute, and DMA (memory access) into queues of commands. This is easily done in parallel, as each thread can create commands on its own, which is great for multi-core processors. Each queue, which are lists leading to the GPU that commands are placed in, can be handled independently, too. An interesting side-effect is that, since each device uses standard data structures, such as IEEE754 decimal numbers, no-one cares where these queues go as long as the work is done quick enough.
Since each queue is independent, an application can choose to manage many of them. None of these lists really need to know what is happening to any other. As such, they can be pointed to multiple, even wildly different graphics devices. Different model GPUs with different capabilities can work together, as long as they support the core of Mantle.
DirectX 12 and Vulkan took this metaphor so their respective developers could use this functionality across vendors. Mantle did not invent the concept, however. What Mantle did is expose this architecture to graphics, which can make use of all the fixed-function hardware that is unique to GPUs. Prior to AMD's usage, this was how GPU compute architectures were designed. Game developers could have spun up an OpenCL workload to process physics, audio, pathfinding, visibility, or even lighting and post-processing effects… on a secondary GPU, even from a completely different vendor.
Vista's multi-GPU bug might get in the way, but it was possible in 7 and, I believe, XP too.
Game developers didn't do this, however. It was a hassle to develop, and I'd assume QA would be a nightmare too, if someone bothered. I believe id Software would use a secondary GPU for CUDA texture processing in RAGE, but that is all I know of. It was more popular outside of gaming software, such as those people who attach a half-dozen GPUs, which may or may not be the same model or vendor, to mine as many Bitcoins as possible.
These new APIs are coming out in a better time, though.
Back then, it was unlikey that a gaming device would have a second, unmatched graphics card available to access. NVIDIA gave it a whirl with PhysX offloading, where their users could get a boost when processing large physics loads by leaving an old GeForce graphics card installed. It did not catch on too much, although I was one of the ones to try.
On-processor graphics is more common, though. For the last couple of years, it was difficult to purchase a new, consumer CPU without getting a GPU in the same package. Windows did not expose this as a compute device by default, though. The hardware would not appear in Device Manager unless you enabled it in your BIOS and a monitor was detected on it. Those who did, however, would have no problem accessing its OpenCL driver if it had one. It could be used as a secondary compute device, while the primary GPU did graphics. As far as I can tell, Windows 10 enables on-processor graphics all the time, even without a display.
Beyond the small available market, a second problem arose: the consoles.
Neither the Xbox 360 nor the PlayStation 3 had a graphics processor that was capable of OpenCL. The first manufacturer to support Compute Shaders in a console at all was Nintendo with the Wii U. There were third-party efforts to make OpenCL run on the PlayStation 3's Cell processor, but I believe those only applied to the few Linux developers that Sony wasn't able to completely chase off of the PS3. For titles that were ported from those platforms, taking advantage of secondary graphics as a compute device would be a significant burden. Even pure PC developers, including software as well as games, avoid OpenCL. They like compute shaders, but they don't like accessing them through OpenCL.
Hey, it'sa U… for once. Image Credit: LoFi Gaming
That is where Vulkan and DirectX 12 could shine. They grab much of the performance and flexibility from OpenCL and wrap it in a graphics API that developers already want to use. Their existence might lead to more variety in how AI is calculated or lighting is performed. Any modern GPU in your system is enumerated and can be attached to a stream of commands, regardless of whatever else is in your system and active.
But they didn't do it first.
I don’t know if it matters
I don’t know if it matters but windows 8.1 lets you see the cpu graphics all the time too without making a change to bios.
I had to change the BIOS for
I had to change the BIOS for my W8.1 setup or else the Intel iGPU was not accessible. It’s obvious because if I run the Intel CPU diagnostic none of the video tests run.
Asus Z77 Sabertooth, i7-3770K, GTX680
By default it was “Auto” and selected the iGPU or PCIe depending on what was plugged in. The other GPU was automatically disabled in the BIOS, though I think I could setup multi-monitor and get both working though I never tried.
I was expecting to read about
I was expecting to read about Lucid’s Hydra after reading the title.
Very interesting article.
Heh. I was referring to APIs
Heh. I was referring to APIs that offer explicit control to the game developer.
did you ever plan to
did you ever plan to do a review of the “lucid virtu MVP” with FCAT testing ?
a couple of years ago, before the FCAT methods, this lucid virtu tech was included as a selling point in some motherboards … if i remember correctly, it was supposed to use the IGP to ‘help’ the GPU output more frames or better in sync with the monitor
at that time, this “virtual vsync” could not really be tested because FPS-reporting was showing the virtual frames instead of the real ones … so no objecting conclusing have been made about this tech (or at least, i haven’t seen any)
with all this new testing methods, i wonder what results might show up when putting lucid virtu tech on an FCAT bench
the tech is probably no longer relevant, if it has ever been, but it is still very actual (with some imagination):
– help with overhead (similar to DX12/vulcan ?)
– some FPS/sync magic (similar to gsync/freesync)
i would love to see this topic revisited with current testing methods … if only to make me stop wondering about it 🙂
This should have been
This should have been available as soon as there were discrete GPUs in the marketplace, there are no excuses from the OS makers for not having this ability all along! It is the most essential task of an OS to be able to utilize any and all processing hardware all of the time, with none of this either or, one but not the other. This goes for integrated graphics as well, multi-adaptor should have been standard in all OSs more than a decade ago. They have been selling GPUs integrated and discrete for how many years now, and they are just now getting around to being able to utilize the GPU hardware, and all of those vector units for any and all tasks. They were so busy working on selling the hardware features, and then not including the features in any software APIs, or OSs, just to sell new hardware with some never utilized potential. This has to be one of the biggest overlooked issues in computing for over a decade and the OS/graphics API makers are just now adding a feature that could have been added a decade ago, well now that they are selling it it’s time to take notice! Let’s just monkey around with the OS’s UI for a few releases, but now that Mantle(now packaged as Vulkan) comes around with SPIR-V (just HSAIL by another name) coming we’ll have to include it in the proprietary graphics APIs.
AMD can be thanked again for another innovative action, like the x86 64 bit ISA, for allowing us to have this feature much sooner that it would have been/should have been adopted, and is now fostering the release/adoption of the entire HSA aware graphics API software stack that will allow people to get more functionality out of the hardware that they already have. If AMD had not brought the x86 64 bit ISA into existence how many more years would that feature have been delayed. The name Mantle may not be around but its influence and feature set will be now forever be included in all the new OSs. So HSAIL’s ability now can be called SPIR-V, and what’s in a name anyways it’s the functionality that counts, and Vulkan will allow for GPUs everywhere to accelerate more than just graphics workloads.
M$ and Intel are not Known for adding features unless there are others adding the features and forcing the issue by virtue of the marketplace. M$ and Intel are happy milking the status quo rather that adding any meaningful innovation unless the competition forces them to take action. This is why we need companies like AMD, with Mantle, and Valve with Steam OS, the ARM mobile marketplace has more innovation going on than the PC/laptop market and already utilizes The Khronos group’s HSA aware APIs.
First off, 10 years ago the
First off, 10 years ago the best technology we had was all fixed function pipeline items. This means every GPU did its thing, but differently, and mixing and matching under those circumstances made little to no sense. The best they could have done is what they did, allow two to three cards to be chained together.
Until now, the implementation to chain them together has been handled by the driver creator, the Game Devs did not have access to that. With OpenCL, CUDA and DXCompute, they were given access to tools that made certain things more efficient, but it still was not at a level where it could just be added into the game and it just worked.
And that was only made possible thanks to the latest shader models, that essentially gave rise to the compute capability of these cards. Now, we could argue that in this last generation of DX10 and 11 things could have been at the level of DX12, and it obviously could have, as shown by the fact that so many common cards in use today can in fact be used by the DX12 API.
Also, SPIR-V is in no way just HSAIL by another name. In fact HSAIL is more akin to C++ AMP. These both take C++ code and turn it into compute code to be used on CPUs, GPUs and other compute devices. SPIR-V is more akin to LLVM-IR, and can be used as a target of HSAIL or C++ AMP as well as HLSL, GLSL and OpenCL.
While AMD does in fact have a significant role in the HSA movement, they are not alone. A significant portion of that trend is thanks to the increase in mobile adoptance, and the need to decrease costs of those devices. I give AMD its due, and they have invested and worked their butts off to get us these innovative tech, but don’t detract all the work the other contributors are doing. AMD isn’t.
They were doing compute on
They were doing compute on vector processors 30+ Years ago, before vector processors did any graphics only workloads! The GPU manufacture’s driver APIs could have offloaded compute to the vector units also through software, and the OS makers should have forced the GPU manufactures to have this ability or their drivers would not be certified and whitelisted to work with the OSs, including the requirement for the GPU/s to always be available and able to perform graphics and GPGPU workloads none of this switched graphics or GPGPU. Really an OS is supposed to manage and utilize all the processing resources on a processing system.
Well any code from any programming language can be compiled into SPIR-V IL and run on the Vulkan runtime, via the same SPIR-V IL that was originally created to run the OpenCL GPGPU language, and HSAIL, and SPIR-V have too much in common to be just a coincidence. You do know that many of the HSA foundation’s same industry leaders are also represented on the many committees, and directors of the Khronos group and the Khronos group’s APIs are the public facing open versions of the Khronos member’s various internal contributions to the Khronos group’s many APIs(Mantle contributions from AMD, Others from Nvidia, etc.). So I would expect that Nvidia will be more likely to support Vulkan and SPIR-V getting the same functionality as HSAIL, that the SPIR-V IL most certainly does have, rather than Nvidia joining the HSA foundation. The Khronos Group is where the open standards software is made available to the entire industry to use with no single industry player’s name on it.
The HSA foundation like the Khronos Group are the main proponents of HSA aware OSs/software, especially for the mobile market, and now the PC/Laptop market, the mobile market was the leader in using the GPU to do more than just graphics to enable the mobile devices to do more with the limited computing power that mobile devices can put into their form factors. Heterogeneous computing has been around as a concept, and utilized for 40+ years and GPUs have a whole lot more vector units than any CPUs, and even the shader units just do math, so yes asynchronous shader units, and other asynchronous units help, but not having them did not prevent the ability to be added a long time ago.
Hell if you have a digital signal processor/processing unit on board a device that has the potential to be utilized for other types of compute, with the proper software and hardware/firmware available to enable the use. SPIR-V IL and HSAIL essentially are doing the same workloads with the same functionality via everything being compiled into their intermediate languages and run on CPUs/GPUs or any other device with the IL abstraction layer available to take advantage of the device’s native instruction set, and both HSAIL and SPIR-V IL can do that for any processing devices. There is a lot more computing power on modern devices than just CPUs and GPUs on many devices, and some of the devices have a lot of computing power in the case of specialized image processors that are added to smartphones, just imaging being able to eventually utilize that if the device is not be used to take photos, or having that specialized signal processing made available for other uses, its all mostly ones and zeros, and math anyways, with a little bit of branch logic built in.
Why has M$ not included any GPU monitoring software in its “Task Manager” you would think by now that would be available, but M$ appears to be more concerned with forcing a smartphone style “APP” ecosystem on its PC/Laptop users, to make up for M$’s shortfall in getting market share in the smartphone market. M$ is late as usual to the smartphone market and all the desktop full application users have to suffer. What a terrible waste of its time, and its users patience! GPU-Z does not work properly on my laptop, so I’m stuck for GPU monitoring software.
Again pcper using NVIDIA as
Again pcper using NVIDIA as default examples in graphs and articles showing strong inner green leanings.
The first one is from AMD’s
The first one is from AMD's programming guide and the second one is from Microsoft, who used NVIDIA and Intel GPUs. I do buy NVIDIA graphics cards from my main PC though, yes.
That said, Fury X was interesting to me. Still not sure what is holding it back in the benchmarks. Bandwidth and raw compute are the top two performance metrics for GPUs and AMD has a 50-100% lead over the 980 Ti. Unless the number of ROPs really are suffocating it, I'd like to see how its performance fares in a year, with a few driver updates and maybe newer games hammering on different parts of the silicon. If it had more than 4GB of RAM, I would be interested in it as a GPU compute card.
Definitely will be considering both Pascal and AMD's next generation architecture with HBM 2.0, though.
While you are at it, Blender
While you are at it, Blender 3d now supports Cycles rendering for AMD’s GCN based GPUs, could you please do some various test runs and Benchmarks if there are any, on AMD’s GCN GPUs and Blender’s Cycles rendering. A lot of folks are now going to have the option of AMD’s more affordable GPUs for cycles rendering workloads, and I very interested is seeing how cycles rendering performs on AMD’s GPU hardware.
Also do you think that HBM2 will becoming to AMDs Fury-X processors as and update before Greenland/Artic Islands arrives? It looks to me like HBM2 is a drop-in replacement for HBM1 on the Fury-X/Fury SKUs as the only thing that would have to be changed is the logic chip at the bottom of each HBM die stack, and higher clock speeds for the memory. The number of traces to each die stack remains the same, and I can see AMD with its limited budget designing HBM2 to be a drop-in replacement for HBM1, via that bottom logic chip abstracting the differences between HBM1 and HBM2 from Fury’s/Future SKUs memory controllers, and the higher clocks for HBM2 is just a change in firmware and the Fury is good to go for an HBM2 update.
While all developers have
While all developers have been free to use OpenCL and CUDA, doing so is non trivial. First, neither gives the direct access to the graphics specific pipelines that have been honed over the years for the specific needs of game engines. They only expose the fancy math units.
Also, by using another language, you increase the chance that the engine will actually slow down, have bugs or in general be more complex to fix. Remember, game developers already deal with C/C++, HLSL, GLSL, and usually some form of a scripting language such as Unityscript, C#, Lua, Unreal Script and others. Adding another language to that mix is more complex than you would think.
Now, not having access to those specialty parts on those extra GPUs is more of a performance hit than you would think. Its fine to put PhysX on the GPU, but have you actually thought about the fact that most CPUs today handle those calculations just fine today? They don’t even use one full core for most games. So putting that on the GPU will make those calculations faster, but actual usage of the second GPU will be minimal, even with integrated graphics.
And until just recently, using OpenCL or CUDA to transfer items from memory from one GPU to the other was orders of magnitude slower than using the DX11 or OpenGL equivilent, which meant hacking together something that might work fast in one situation, but in general didn’t hold, let alone with all the possible combinations of GPUs available.
Then you have the lackluster support of OpenGL from Nvidia, meaning you still can’t use those features that make memory swapping fast on their cards, even though they can handle them and do in CUDA. You have to limit yourself the the most recent supported version by both parties, OpenCL 1.1 up until the last few months. Or you can create two entirely unique solutions using the latest version of OpenCL and CUDA, adding that much more of a headache.
Remember, adding a new language to your engine does not cause an additive number of bugs and headaches, you have caused an exponential increase in the complexity of the software.
Now, DirectCompute, the DX11 specific language can solve some of those issues on Windows, but doesn’t help for consoles, OS X, mobile or Linux. All of these reasons contributed to why you haven’t seen shipping games with this tech.
Now, have these guys experimented with all of this, and created side projects to see where the future is going? Yes, of course they have. They are gamers as well, and want this to work as much as we do. Which is why when they got involved with Mantle and now Vulkan, these things are going to work and with much fewer of these issues than before.
Yeah. Before DX12 and Vulkan,
Yeah. Before DX12 and Vulkan, a number of people were experimenting with software rendering through GPU compute. You obviously lose most of the fixed function hardware, but you get control in return. It's just math at that point.
NVIDIA is the first one that pops to my mind, who created a software rasterizer in CUDA. Their implementation was mostly in the few-millisecond range with a GeForce 480. Epic Games was also talking about converting to GPU-accelerated software rendering, but a year later cited an order of magnitude (10x) higher development costs.
But yeah, Vulkan and DX12 will probably change that a lot, now that graphics APIs seem to be taking compute seriously.
I think it was SIGGRAPH 2014
I think it was SIGGRAPH 2014 that AMD demoed some Ray Tracing Acceleration on its Firepro GPUs, which is great to have done on the GPU, as those Xeon server/workstation SKUs can cost in the thousands, and many Xeons are required for heavy Ray Tracing workloads. I can’t wait for dedicated Ray tracing hardware to begin appearing on discrete GPU, and on AMDs integrated graphics, Nvidia’s to. I don’t think the PowerVR wizard with the dedicated ray tracing hardware has been utilized in any products yet.
WebCL is dead, but with
WebCL is dead, but with SPIR-V I could see a new iteration coming to fruition, especially with the new web assembly standards project by the ASM.js and SIMD.js folks.
I wouldn’t be surprised if a new version of WebGL didn’t support it as well.
Yeah, I’m not too sure about
Yeah, I'm not too sure about SPIR-V.
For the longest time, Mozilla was killing any standard that didn't ship source (because of their education initiatives). Then they created asm.js and Web Assembly, which is not only difficult to read but also pretty much impossible to write without cross-compiling. Maybe they consider C/C++ as "open enough"? Maybe they aren't creating developer lock-in for the Web anymore?
Khronos Group would love to have a Web platform that accepted SPIR-V, though.
Yes compiled into SPIR-V IL
Yes compiled into SPIR-V IL bytecode and run on the Vulkan VM/LLVM, it’s not just for OpenCL anymore, and they did mention Python, and a limited version of C++ targeting the SPIR-V IL, among others. I guess that Vulkan could integrate all that feature set that the HSA foundation has listed for its HSAIL. I thought that most of the browser makers had representation on the Khronos group’s committees. I’d expect the major browser makers are very much involved with Vulkan, and other Khronos APIs. SPIR-V may have an easier time being accepted on Nvidia’s hardware, at least as far as Nvidia allows support of open standards APIs. I can see Valve on board with Vulkan, and its various software facilities/APIs, as far a Linux/Steam OS gaming is concerned, and that includes those that make the main Linux based browsers. Eventually all of the codebase targeting OpenCL, and the other Khronos APIs are going to begin targeting Vulkan instead, for more than just graphics workloads, if just to get better Multiprocessor support, to go along with GPU support.
I was speaking about being
I was speaking about being unsure of SPIR-V as part of a Web standard, like WebGL. The Web's traditionally hostile to compiled bytecode formats.
“Updated Plans For Adding
“Updated Plans For Adding SPIR-V Support To LLVM”
Written by Michael Larabel in AMD on 17 June 2015 at 05:13 PM EDT, Phoronix.
ASM.js was never intended for
ASM.js was never intended for anything but as a compiler target. You weren’t supposed to try to write it. Then again, html was never intended to hand written, the create meant for us to use WYSIWYG tools.
The IETF has formed a working group to create a “bytecode” for the web. It is currently based on ASM.js and emscripten.
Since Brendan works for Mozilla, I think they are ok with this.
This is a binary format, like SPIR-V, and can be transformed into a text format. Which may make the difference for SPIR-V versus compiled binaries. Which is what Mozilla always fought against.
They are trying to compete with mobile, which has grown faster than the web. This will enable much better performance and allow app developers to target the web as just another platform with their C/C++ code, and eventually other languages, since the first implementation is through llvm via emscripten.
You should find this interesting, thanks to Unreal Engine 4 using emscripten as its path to web based games.
Why So sensitive team red?
Why So sensitive team red?
That’s the GameNecks that
That’s the GameNecks that populate both sides of the GPU rivalry, trying to wrap their simple minds around the complex topic of technology, and turning it into a sporting match. A lot of pathological types are into gaming, but the technological aspect just confuses them, and angers those with their fragile self images! They are just acting out on both sides, but that can be expected from those with limited reasoning abilities!
I really look forward to the
I really look forward to the day where games will take advantage of the DirectX12 API and thus improved multi-GPU support. But it doesn’t really make sense whining about the fact that is has not been done before. Consider that there has been SLI and Crossfire for many years. The industry (Software and Hardware developers) always goes for the masses or for markets where they can make huge profit.
Allthough every PC-Magazine does thorough multi-GPU testing it does not mirror reality. In RL less then 1/1000th of all PC-Gamers have more then one discrete GPU in their system.
AMD might created Mantle only because PC-Gaming is a growing market and they wanted to sell more GPU’s.
Remember the days where a typical VGA-Card had just one connector to attach exactly one Monitor ? And where Mainboards had exactly one AGP-Slot ? LOL. Sure not gonna miss that days. So let’s hail everybody who made this development possible.
I dont think anyone cares who
I dont think anyone cares who did it first, just that it was done.