The Dual-Fiji Card Finally Arrives
Final details of the Radeon Pro Duo are finally here, in the form of a leaked slide deck.
This weekend, leaks of information on both WCCFTech and VideoCardz.com have revealed all the information about the pending release of AMD’s dual-GPU giant, the Radeon Pro Duo. While no one at PC Perspective has been briefed on the product officially, all of the interesting data surrounding the product is clearly outlined in the slides on those websites, minus some independent benchmark testing that we are hoping to get to next week. Based on the report from both sites, the Radeon Pro Duo will be released on April 26th.
AMD actually revealed the product and branding for the Radeon Pro Duo back in March, during its live streamed Capsaicin event surrounding GDC. At that point we were given the following information:
- Dual Fiji XT GPUs
- 8GB of total HBM memory
- 4x DisplayPort (this has since been modified)
- 16 TFLOPS of compute
- $1499 price tag
The design of the card follows the same industrial design as the reference designs of the Radeon Fury X, and integrates a dual-pump cooler and external fan/radiator to keep both GPUs running cool.
Based on the slides leaked out today, AMD has revised the Radeon Pro Duo design to include a set of three DisplayPort connections and one HDMI port. This was a necessary change as the Oculus Rift requires an HDMI port to work; only the HTC Vive has built in support for a DisplayPort connection and even in that case you would need a full-size to mini-DisplayPort cable.
The 8GB of HBM (high bandwidth memory) on the card is split between the two Fiji XT GPUs on the card, just like other multi-GPU options on the market. The 350 watts power draw mark is exceptionally high, exceeded only by AMD’s previous dual-GPU beast, the Radeon 295X2 that used 500+ watts and the NVIDIA GeForce GTX Titan Z that draws 375 watts!
Here is the specification breakdown of the Radeon Pro Duo. The card has 8192 total stream processors and 128 Compute Units, split evenly between the two GPUs. You are getting two full Fiji XT GPUs in this card, an impressive feat made possible in part by the use of High Bandwidth Memory and its smaller physical footprint.
Radeon Pro Duo | R9 Nano | R9 Fury | R9 Fury X | GTX 980 Ti | TITAN X | GTX 980 | R9 290X | |
---|---|---|---|---|---|---|---|---|
GPU | Fiji XT x 2 | Fiji XT | Fiji Pro | Fiji XT | GM200 | GM200 | GM204 | Hawaii XT |
GPU Cores | 8192 | 4096 | 3584 | 4096 | 2816 | 3072 | 2048 | 2816 |
Rated Clock | up to 1000 MHz | up to 1000 MHz | 1000 MHz | 1050 MHz | 1000 MHz | 1000 MHz | 1126 MHz | 1000 MHz |
Texture Units | 512 | 256 | 224 | 256 | 176 | 192 | 128 | 176 |
ROP Units | 128 | 64 | 64 | 64 | 96 | 96 | 64 | 64 |
Memory | 8GB (4GB x 2) | 4GB | 4GB | 4GB | 6GB | 12GB | 4GB | 4GB |
Memory Clock | 500 MHz | 500 MHz | 500 MHz | 500 MHz | 7000 MHz | 7000 MHz | 7000 MHz | 5000 MHz |
Memory Interface | 4096-bit (HMB) x 2 | 4096-bit (HBM) | 4096-bit (HBM) | 4096-bit (HBM) | 384-bit | 384-bit | 256-bit | 512-bit |
Memory Bandwidth | 1024 GB/s | 512 GB/s | 512 GB/s | 512 GB/s | 336 GB/s | 336 GB/s | 224 GB/s | 320 GB/s |
TDP | 350 watts | 175 watts | 275 watts | 275 watts | 250 watts | 250 watts | 165 watts | 290 watts |
Peak Compute | 16.38 TFLOPS | 8.19 TFLOPS | 7.20 TFLOPS | 8.60 TFLOPS | 5.63 TFLOPS | 6.14 TFLOPS | 4.61 TFLOPS | 5.63 TFLOPS |
Transistor Count | 8.9B x 2 | 8.9B | 8.9B | 8.9B | 8.0B | 8.0B | 5.2B | 6.2B |
Process Tech | 28nm | 28nm | 28nm | 28nm | 28nm | 28nm | 28nm | 28nm |
MSRP (current) | $1499 | $499 | $549 | $649 | $649 | $999 | $499 | $329 |
The Radeon Pro Duo has a rated clock speed of up to 1000 MHz. That’s the same clock speed as the R9 Fury and the rated “up to” frequency on the R9 Nano. It’s worth noting that we did see a handful of instances where the R9 Nano’s power limiting capability resulted in some extremely variable clock speeds in practice. AMD recently added a feature to its Crimson driver to disable power metering on the Nano, at the expense of more power draw, and I would assume the same option would work for the Pro Duo.
The rest of the specs are self-explanatory – they are double everything of a single Fiji GPU. The card will require three 8-pin power connectors, so you’ll want a beefy PSU to power it. In theory, the card COULD pull as much as 525 watts (150 watts from each 8-pin connector, 75 watts over the PCI Express bus).
AMD is definitely directing the Radeon Pro Duo towards professionals and creators, for several reasons. In terms of raw compute power, there isn’t a GPU on the market that will be able to match what the Pro Duo can do. For developers looking to have access to more GPU horsepower, the price of $1500 will be more than bearable and will give a pathway to really start diving into multi-GPU scaling integration for VR and DX12. AMD even calls out its FireRender technology, meant to help software developers integrate a rendering path for third-party applications.
But calling yourself out as the “world’s fastest graphics card” also means you are putting yourself squarely in the sights of PC gamers. At the Capsaicin event, AMD said the card was built for "creators that game and gamers that create." AMD claims the Radeon Pro Duo is 1.5x the performance of the GeForce GTX Titan X from NVIDIA and 1.3x the performance of its own Radeon R9 295X2.
AMD even shows gaming benchmarks in its slide deck, showing performance leads in Rise of the Tomb Raider, Grand Theft Auto V, Battlefield 4, Assassin’s Creed Syndicate, Ashes of the Singularity and Far Cry Primal. All of this testing was done at 4K, and we assume pretty close to maximum quality settings. Even though AMD says it would like to avoid pitching the Radeon Pro Duo to gamers, it’s clear that the company sees it as a marketing bullet point worth promoting.
Obviously the problem with the Radeon Pro Duo for gaming is that it depends on multi-GPU scaling for it reach its potential. The Titan X is a single GPU card and thus NVIDIA has much less trouble getting peak performance. AMD depends on CrossFire scaling to get peak performance (and the rated 16 TFLOPS) for any single game. For both NVIDIA and AMD, that can be a difficult process, and is a headache we have always discussed when looking at multi-GPU setups, whether they be on a single card or multiple.
Our final leaked slide indicates that AMD will have both Radeon Crimson and FirePro branded drivers for the Radeon Pro Duo. I would assume that you would select which driver you wanted to install based on your intended use – gaming or development.
That’s pretty much all the information we have on the Radeon Pro Duo without getting a sample and doing our own testing. Clearly the card is going to be an interesting product. Its price, performance profile, and timing all make it a questionable release. The Radeon Pro Duo could be an amazing performer for high-end PC gaming as long as buyers are willing to accept the complication of multi-GPU configurations. The price is steep, at $1499, that is the same cost as three Radeon R9 Nanos. I estimate the performance of the Radeon Pro Duo to basically equate to a pair of R9 Nano cards running in CrossFire, which would run $500 less than this new product. As for timing, AMD is making no secret of the pending release of new hardware based on the Polaris architecture, which improves efficiency, adds support for HDMI 2.0 and DP 1.3 and quite a bit more. Buying a high-end graphics cards based on the out-going architecture will be a tough pill to swallow for many.
Even so, it is likely that the Radeon Pro Duo will be the fastest AMD graphics card for some time to come, even after Polaris hits the streets.
Hopefully I’ll be able to get some hands-on time with a card in the near future to judge for ourselves where the Radeon Pro Duo fits into the ecosystem. We are working on doing just that…
UPDATE (4/26/16): Now that we have the official press slide deck from AMD available to us, there was some additional information shown in it that I wanted to share with our readers. First, if you wanted a better look at the build of the PCB and the construction of the cooler, here are a couple of cool shots.
The two distinct liquid coolers are still being built by Cooler Master despite the early issues we had with the Fury X pump noise. The design looks impressive, with interconnected pumps inline with each other driving the 350+ watt card into reasonable temperatures (I assume, having no hands on with the card). The bare PCB shows two massive Fiji XT GPUs with HBM and a PLX bridge between them to manage PCI Express traffic.
Though AMD provided sample benchmark data pitting the Radeon Pro Duo against a single Titan X in a handful of games, this comparison in Ashes of the Singularity actually compares the new AMD flagship to the GTX Titan X in SLI! Here, at 4K resolution and maximum image quality settings, the Radeon Pro Duo is able to cross 50 FPS and beat the two GTX Titan X cards by about ~20%.
Finally, the "official" target for the Radeon Pro Duo is the professional market, and AMD does provide some scaling numbers for the card in applications like 3DS Max. Even though a comparison to a high-end Intel Extreme Edition processor is a bit of laughable data point, AMD does see 1.73x scaling for the Radeon Pro Duo with both GPUs enabled compared to just a single GPU option like you have with the Fury X or the R9 Nano.
I wonder what DP revision
I wonder what DP revision it’s running.
It’s still Fiji, so
It's still Fiji, so definitely DP 1.2a.
Oh well. Still excited for
Oh well. Still excited for the card itself though, AMD delayed the heck out of this thing…
Why no comparison to GP100?
Why no comparison to GP100? The P100s are out now.
Isn’t P100 around $13,000?
Isn’t P100 around $13,000? That isn’t quite the same market.
Bad troll is bad. P100 isn’t
Bad troll is bad. P100 isn’t even a graphics card, let alone “out now”.
It is the next big GPU(G IS
It is the next big GPU(G IS FOR GRAPHICS) chip from Nvidia and therefore the next real flagship GPU for use in Quadro and Geforce product lines as well.
It is totally relevant because its also the first real new architecture from Nvidia and the first TSMC 16nm GPU. GM200 and 204 were basically GK with no DP and a few new feautures and 50% better FLOPS and no change in memory bandwidth.
GP100 is a massive improvement in SP, DP, 1/2P FLOPS and 2x the memory bandwidth. GP104 should have about the same improvement in terms of SP FLOPS and memory bandwidth.
Youd be incredibly stupid to have bought a 28nm GPU or consider buying one now if ypu had the opportunity to buy a 16nm GPU offerimg 2-3x the performance pf what it replaces.
This is a tech site. Acting like 28nm GPUs are still new is a bit silly.
GP100 is the CHIP
P100 is the
GP100 is the CHIP
P100 is the PRODUCT
P100 is the first GPU(board)
P100 is the first GPU(board) using the GP100 GPU(chip). Where did i say otherwise?
Its still relevant, especially if theyre comparing things to the Radeon Pro.
16.38 TFlops of compute
16.38 TFlops of compute obfuscation as they fail to state if it’s single precision or double precision or if the compute is integer or floating point. WTF with these marketing morons the companies hire!
Compute performance is a rather broad unquantifiable term to be using, and I’ll bet the online professionl server websites will not fall for this crap like the gaming sites do. It’s not just AMD’s marketing it’s all marketing and the utter slack-jaws that fall for any marketing anywhere!
“Designed for creators” and yet where are some blender 3d benchmarks or other graphics software graphics benchmarks! I want to see more Vulkan benchmarks from AMD also, an not just DX12 information! The Chinese will be using windows 10 because of its call home slurpware, so maybe there will be plenty of progress from that side of the world for more Linux/Vulkan work, as well as from this side of the world(Mostly the EU) for more Linux/Vulkan gaming outside of M$ sticky fingers!
edit: Chinese will be using
edit: Chinese will be using
to: Chinese will not be using
That’s definitely single
That's definitely single precision measurement.
Also, FLOPS stands for FLOATING point operations per second.
Even WccfTech lists the SP FP
Even WccfTech lists the SP FP and the DP FP numbers for this SKU, the benchmarks will tell some of the compute results. There is more to compute than just FP/INT and the peak metric will be hit on very few occations with the available games/gaming engines and graphics APIs.
More Vulkan/Linux and Less M$/DX12!
AMD’s Async needs to be compared to Nvidia’s Async under the newer Polaris and Pascal SKUs, as Nvidia has improved some of its thread scheduling granularity on its Pascal Micro-Arch, based SKUs. So no more waiting until the end of the draw call to schedule compute/graphics for Nvidia(?) when threads are managed/dispatched on Pascal based SKUs.
Phil Rogers the AMD HSA expert is doing some good over at Nvidia!
While you do bring up some
While you do bring up some valid concerns (rare to hit actual peak performance, etc) rating cards like this is industry practice and it is how everyone does it. Intel, nVidia, etc, they all list the peak theoretical. And as far as this being a dual GPU solution, that actually matters MUCH less, in fact almost none to pure compute as you can just run your kernel on both GPU’s at the same time. Plus, for anyone remotely familiar with this industry, and these products, you would know that it is Single Precision, and obviously floating point as FLOPS, as mentioned above, is specifically floating point.
Also, thread scheduling
Also, thread scheduling granularity isn’t the same as async compute. Async compute is more like hyper threading, where you can have multiple threads in flight at the same time, each using different resources on the chip, whereas fine grained thread scheduling is like a single core cpu just switching back and forth between the tasks really fast. It’s better than they were before, but still not the same.
Chart shows 4096 for rated
Chart shows 4096 for rated speed on the R9 Nano
Doh, thanks, fixed.
Copy and
Doh, thanks, fixed.
Copy and paste fail there. 🙂
“AMD Updates Carrizo Firmware
“AMD Updates Carrizo Firmware To Support More UVD Sessions”
https://www.phoronix.com/scan.php?page=news_item&px=AMD-Updates-CZ-UVD-Blob
Wouldn’t it have made sense
Wouldn’t it have made sense for Amd to place both fury chips on one interposer? That way they could have utilised the memory as a shard pool giving it effectively 8GB. The bandwidth of HBM is so high bandwidth losses would be minimal.
That would be a very large
That would be a very large and possibly very expensive interposer. The current Fury is already close to the reticule size, as far as I know. I haven’t seen any exact figures, but going larger than the reticle will probably not be cheap. Also, how would you connect the memory together? You can’t just connect both chips to the same memory. They would need to add an inter-processor link to the die along with a extra logic to handle sharing the memory. It needs to be a very fast link to actually share the memory and make it look like a unified 8 GB space, which could mean two 4096-bit links per die would be required, one for local memory and one for an inter-processor link.
Well, as another poster
Well, as another poster mentioned, the interposer would be WAY too big to be built using current technology. Let’s just toss that aside, for a second, though. Let’s assume it could be built. This would require a MASSIVE change in the way ALL of the software works. Just because there are two GPU’s that have high B/W access to RAM does not mean they can just share it. The software needs to be written to accomplish that, and the current stuff is NOT EVEN CLOSE to handling memory that way. It would require an entire UPROOT of the graphics API’s to do this. And I know you are going to try and say, “But DX12/Vulkan/etc.” NOPE. Look the truth is DX12 WILL allow game developers to manage the memory on multiple GPU’s very specifically, but the fact is that a GPU still needs to have all of the textures in the scene in it’s local memory. If you are doing AFR or EVEN SPLIT FRAME you will pretty much need all of the same textures … so there WILL be a LOT of duplication of memory. That’s just how it goes. I mean maybe we can see a 10-30% improvement, but don’t expect anything more than that.
It doesn’t necessarily
It doesn’t necessarily require any changes to the software. If you have a fast enough inter-processor link, then handling remote memory can be done in hardware. This is just like the multi-socket CPU systems. AMD uses HyperTransport and Intel uses QPI links. The speed of the processor interconnect needs to be close to the speed of local memory though. In the past, this was not really possible for GPUs. A link capable of hundreds of GB/s going through the PCB would be very expensive. It would consume massive amounts of power and take a huge amount of die area. Nvidia is working on higher speed links with thier NVLink technology, but these are not going to be fast enough to share memory. It will be significantly slower than local memory bandwidth.
Silicon interposers make such a high bandwidth link (close to bandwidth of local memory) much more economical as far as power consumption and die area. The memory controller would still need to have quite a bit of extra logic to handle addressing remote memory though in addition to the inter-processor link. Designing this into the GPU for a low volume product is not worth it. There is a reason that AMD and Intel do multiple versions of their CPUs. Only the high end Xeon and Opteron processors have inter-processor links to support multiple sockets. Such a CPU design waste lots of resources when it is used in a single socket system. The memory controller will see higher latency due to communication overhead (cache coherency), accessing remote memory, and just having more stages to go through to access local memory.
Bottom line: there are reasons AMD not want to design such features into every Fiji die. Designing it would not be cheap and it is probably not economical to make a separate die just to support this low volume product. Even without design and verification cost, making a new mask set can cost, as far as I know, millions of dollars. We may see multi-GPU interposer based designs in the next generation or two; mainly due to multiple smaller die being cheaper than a single large die because of yields. These designs may still not share memory though. It may be more economical to just use larger capacity memory than to try to share it. Moving data around cost power and takes die area. With the capacities available by stacking die, it just may not be worth it to support high speed inter-processor links. This means that these systems will appear to software as multiple discrete GPUs, and this will require software support. So you are right, but not quite for the right reasons.
The 295×2 was often the
The 295×2 was often the performance leader, in games which supported CrossFire, even when compared to the Titan X and AMD Fury which came out later. The Titan X and Fury are giant GPUs; around 600 mm2. I suspect we will be getting relatively small GPUs in the consumer market with the move to 14 and 16 nm. That could mean that the Radeon Pro Duo will be the highest performing single card solution for a while. Nvidia may be targeting a larger die size for their upcoming product, but don’t expect a large 16 nm die to be cheap. The 295×2 was actually the best bang for your buck in the super high end price range for a while. It would have been a great product if it had not been for the frame time variance issues and scaling issues in some games. I would hope that such issues will be much less common with DX12 and more modern game engines. Although, better support for multi-gpu configurations seems to be one of the reasons to sell this card, so we may not be there quite yet.
hdmi 2.1 support?
hdmi 2.1 support?
Didn’t Nano drop in Price by
Didn’t Nano drop in Price by a nice margin?
Beyond that.> So want this……
Nice PcPer for deleting my
Nice PcPer for deleting my comments to a guy who been trolling every article about AMD or just trolling about AMD in general. I did not say anything negative neither did the guy below me. I’m amazed at how you let some of the nastiest comments go on between certain people but delete my comment for calling him out. So go ahead and delete this one too for calling you out on your hypocrisy.
Not buying it. What was your
Not buying it. What was your comment?
Your comment was racists and
Your comment was racists and homophobic. Get over it.
Ryan it was not my comment. I
Ryan it was not my comment. I was guy commenting to the the guy that had the shitty comment. Know the one who you are referring to. It wasn’t my comment nor the guy below me and his was deleted to for calling the guy out on that particular comment. To say my comment was racist and get over it is bullshit, because I wasn’t the one who made the comment. That is why I called you guys on it.
Like 100000000%, fuck yea,
Like 100000000%, fuck yea, dont let them steal our pimp cane, FUCK YEA!
problem is fury x cards have
problem is fury x cards have ton of board buzzing/coil whine that you cant really fix unless u resolder the board or put super glue on all the board main connections, so while the card is quiet cause it uses water cooling a lot of them have the very annoying constant whine/buzzing even at idle. I bet this pro duo is gonna be same issue. For those prices u would think they would solve this issue…
The quieter you make the
The quieter you make the cooling solution, the more noticable other noises will be.
I agree these issues are a
I agree these issues are a drawback on the Fury / Nano products. Hopefully the Pro Duo doesn't have it.
It’s 3x the price of the Nano
It’s 3x the price of the Nano and 2x the performance….Won’t this card also be affected by Crossfire issues? I mean if a game doesn’t support Crossfire won’t two GPU’s on a single PCB suffer the same as two GPU’s on two PCB’s? If yes, it seems overpriced. If no, then I guess this is an expensive but good buy.
All the previous press says
All the previous press says they are marketing this card at professionals not gamers (hence the pro branding). Which explains the price (Which is actually kind of low for a professional card). The Crossfire thing is a problem but I think by getting this into content creator hands they hope for better multi-card support going forward (specifically with VR, Vulkin and DX12).
Also, I wouldn’t be surprised
Also, I wouldn’t be surprised if the $1500 price tag drops a bit after we start getting the 14 nm products. The 295×2 started out quite high and dropped down later even though it was still the performance leader for applications that supported dual GPU. Professional users who need the performance and can afford it will buy it now. It may be a better option for gamers later, once the price drops, but even with price drops, it is hardly a budget offering.
If the enthusiast gamers are
If the enthusiast gamers are smart, they’ll stay away from this card. Multi-GPU issues along with 4GB of VRAM per GPU, just not worth it in the long run.
Beautiful card! I’ll take
Beautiful card! I’ll take one with an EK waterblock please! 🙂
It amazes me why GPU
It amazes me why GPU suppliers aim their big guns on developers then us mortal gamers. Sure you want the best graphics for your game but all that time spent developing awesome graphics for very few gamers that can actually see it.
By the time us mortals can afford the GPU we have moved onto the next game and the circle continues.
John
I agree with you. They know
I agree with you. They know the devs will pay that price for the product or at least they hope so. We have to wait for it to drop a great deal for us to afford it or be willing to pay a small premium. Some enthusiasts with the cash will fork over that kind of money for it just to have have the fastest gpu. I think it will drop in price rather quickly kind of like the 295×2 did when it was released at $1500.
When you are developing a
When you are developing a game though, you are targeting hardware a year or two in the future. You want to develop on the highest performance hardware because there will still be a large installed base of 970s, 980s, 390’s, etc. for several years. Also, the mid-range of the next generation will perform similarly to the current high-end. Also, AMD wants to provide a multi-gpu target for game engine development. Game engines will be targeting architectures several more years in the future compared to games. Multi-gpu obviously makes a lot of sense for VR since you can render separately for each eye, but the entire software framework needs to support this. We may be forced to use multiple smaller GPUs in the next few years due to process tech limitations.
So dumb question im sure, 16
So dumb question im sure, 16 TFLOPS @350 wattage is good right like very! good?…
Will admit, the more i think
Will admit, the more i think of it, rendeirng would be nice if it had proper OpneCL support. I use Blender, and though two AMD engineers did help Blender to split the rendering (Cycles) to support OpenCL implementation, it is still far far behind.
My R9 290x matches, and even at times falls behind the GTX 680. 🙁
any more information on that FireRender Technology???
295X2 was 500W with dual
295X2 was 500W with dual 8-pins. This is 375W with triple 8-pins. What are they doing?
Glad to see they finally
Glad to see they finally added HDMI 2.0 support.
I can’t believe you guys
I can’t believe you guys didn’t cherry pick some DX12 benchmarks, games that virtually no one plays that make the radeon pro duo look totally awesome compared to the GTX 980ti, rather than your benchmark suite containing popular games that people actually play. Seriously Ryan whats wrong with you guys? lol