Two Vegas…ha ha ha
We got two Vegas…so why not CrossFire them?
When the preorders for the Radeon Vega Frontier Edition went up last week, I made the decision to place orders in a few different locations to make sure we got it in as early as possible. Well, as it turned out, we actually had the cards show up very quickly…from two different locations.
So, what is a person to do if TWO of the newest, most coveted GPUs show up on their doorstep? After you do the first, full review of the single GPU iteration, you plug those both into your system and do some multi-GPU CrossFire testing!
There of course needs to be some discussion up front about this testing and our write up. If you read my first review of the Vega Frontier Edition you will clearly note my stance on the idea that “this is not a gaming card” and that “the drivers aren’t ready. Essentially, I said these potential excuses for performance were distraction and unwarranted based on the current state of Vega development and the proximity of the consumer iteration, Radeon RX.
But for multi-GPU, it’s a different story. Both competitors in the GPU space will tell you that developing drivers for CrossFire and SLI is incredibly difficult. Much more than simply splitting the work across different processors, multi-GPU requires extra attention to specific games, game engines, and effects rendering that are not required in single GPU environments. Add to that the fact that the market size for CrossFire and SLI has been shrinking, from an already small state, and you can see why multi-GPU is going to get less attention from AMD here.
Even more, when CrossFire and SLI support gets a focus from the driver teams, it is often late in the process, nearly last in the list of technologies to address before launch.
With that in mind, we all should understand the results we are going to show you might be indicative of the CrossFire scaling when Radeon RX Vega launches, but it very well could not. I would look at the data we are presenting today as a “current state” of CrossFire for Vega.
Setup: Just as easy as expected
Installing and enabling CrossFire with our Radeon Vega Frontier Edition hardware was a simple as would expect. The current driver from AMD’s website was used, and in both the Game Mode and the Professional Mode, the CrossFire option exists under the Global Settings.
We only had one hiccup in our testing in terms of stability with Rise of the Tomb Raider – but the issue seemed related to our Frame Rating overlay application. While the application was running fine without this overlay in CrossFire mode, we require the overlay to measure performance accurately using our capture methodology. Because the capture methods of our performance analysis are even more important when evaluating multi-GPU performance (where anomalies are more common), I decided to leave out RoTR results than report potentially inaccurate scores.
Our test setup remains unchanged, in both hardware and software, from our initial Radeon Vega Frontier Edition review. If you need another refresh on how we test gaming performance, which is still quite different than the norm, you can find that page of our previous review right here.
Let’s dive in to the results!
Great review thanks!
Uhm
Great review thanks!
Uhm something is wrong on that fallout 4 page. UHD runs does not support the conclusions or average fps table.
Ah, numbers were transposed.
Ah, numbers were transposed. Thanks! Fixed now.
What about any DX12/Vulkan
What about any DX12/Vulkan Explicit Multi-GPU adaptor testing? Are there any benchmarks that can test DX12’s/Vulkan’s Explicit Multi-GPU adaptor and GPU load balancing managed fully under the control of the DX12/Vulkan graphics APIs.
My hopes are that Both DX12’s and Vulkan’s API managed, via the games and graphics software’s calls to the graphics APIs, will see better multi-GPU results with everybody developing future games/graphics/other applications that target Vulkan’s and DX12 Explicit Multi-GPU adaptor features where the developers/entire industry have full GPU load balancing control.
Also please keep us updated when AMD issues any new driver updates for Radeon Vega FE by doing a new round on benchmarks.
Thanks for the reviews, and please try to work some more Blender rendering benchmarks that test total render times and not necessarly any FPS metrics.
Is there any way AMD could
Is there any way AMD could utilise Infinity Fabric for near 100% scaling with two Vega chips on one card?
yes. no. it should be better
yes. no. it should be better than regular crossfire at least. it depends on them designing the GPU with that idea in mind. Also apparently latency is a bigger issue for GPUs – according to random person on the internet. IF should have higher latency so that the 2 GPUs won’t really be 1.
Seeing that IF is used in
Seeing that IF is used in Ryzen for CCX connectivity it seems to work well, why shouldn’t it work for GPUs? But what do I know, I’ve never designed a GPU
The Infinity Fabric(IF) will
The Infinity Fabric(IF) will be used across all of AMD’s products as it is on the Zeppelin Die(Used by AMD on its Ryzen/Threadripper/EPYC) SKUs for the 2 CCX units on the Zeppelin die to communicate via that IF coherent fabric. The Infinity Fabric also extends beyond the 2 CCX units on the Zeppelin die to the other CCX units on other Zeppelin dies on the same MCM, as well across the socket(On 2P Epyc systems) to the Zeppelin dies and their CCX units on the other Epyc CPU. ThreadRipper will support 2, or more, Zeppelin dies on an MCM the same as Epyc but ThreadRipper will probably only support 4 memory channels per MCM while EPYC will support 8 memory channels(2 per Zeppelin die) per chip/MCM.
This Infinity Fabric(In a similar method as Nvidia’s NVLink) can be used to interface an EPYC chip up to a Vega GPU(That also uses the Infinity Fabric coherence protocol). So coherent communication can happen between Epyc CPUs and Vega GPUs via the IF, Ditto for any Vega GPU to other Vega GPU/s coherent communication via the IF.
Don’t forget that AMD is also a founding member of OpenCAPI. So that IBM lead foundation’s members(AMD/Others) will have their products able to interface up with IBM’s power9s and any third party Power9 licensees’ power9 CPUs(Google/Others that license Power9). So OpenCAPI is derived from IBM’s CAPI(Coherent Accelerator Processor Interface) IP that is now open and called OpenCAPI and AMD will offer support for OpenCAPI on AMD’s GPU accelorator/AI SKUs.
Yep, the interesting upshot
Yep, the interesting upshot of which, is that the coherency issues of crossfire, are exactly what fabric/zeppelin/ryzen/hbcc/raven ridge apu/… are all about.
I am certain amd have multi gpu apuS planned asap (i have seen official slides somewhere), and they have all the ingredients to make it both cheap and awesome.
Dont u think, a maker of cpuS AND gpuS, if designing a coherency system for their cpus (as in ryzen), would do all in their power to design it to work with gpuS also?
A partial answer, is we know a single zen ccx will be paired w/ a vega via fabric on raven ridge apu, so thats step 1.
I think, to all intents, the app will just see a single gpu, fabric does the rest.
I still dont think the true import of what amd is up to has sunk in for the industry.
check this:
jul2016
https://semiaccurate.com/2016/07/25/amd-puts-massive-ssds-gpus-calls-ssg/
Coherency is more to do about
Coherency is more to do about Cached data/code being moved around by cache controllers across CCX units, processors(CPUs, CPUs, DSPs, etc.) sockets, or even PCIe cards. So the Infinity Fabric as well as OpenCAPI protocols over various processor fabrics via the respective processors’ cache controllers that speak Infinity Fabric/OpenCAPI/other coherent protocols.
AMD’s Infinity Fabric encompasses both a CPU/GPU/other processor control fabric/associated dedicated fabric and a separate data fabric where the various processors CPUs/GPUs/Other processor’s cache and memory controllers that have the Infinity Fabric IP included in their hardware. The Infinity Fabric can via processors cache controllers manage cache/data coherency across any type of processor with the Infinity Fabric IP included.
So with say and Epyc CPU can have some form of cache coherency traffic Epyc CPU’s cache directly to Vega GPU’s cache that is managed over that Infinity Fabric IP that is on both Zen/Epyc CPUs and Vega GPUs or any other processor IP that AMD decides on, such as FPGAs, DSP, etc. So for all points and purposes AMD’s infinity Fabric can create and APU type of arrangement between any Epyc CPU and any direct attached discrete Vega based GPU with the Zen cores on the Epyc SKUs able to communicate CPU controller to GPU/other processor cache controller and pass data/code in a more direct way without any secondary trips to and from slower memory.
The connected SSD(on older GCN and Vega GPUs) on PCIe card with which you refer is more being handled by each respective processor memory controller and what ever virtual memory page table/VM memory swap IP is native to each GPU’s and CPU’s respective memory controllers. That SSD is the last level of VM swap space on, for example, Vega at the bottom tier with the main system memory being the next level up the memory/paging storage hierarchy. The next level above is, on Vega, the HBM2(HBC)/Cache(On Vega SKUs) that effectively is treated like a last level cache by the Vega GPU’s HBCC and associated Cache subsystems above. The HBCC is a direct client of the L2 cache on Vega so there is efficiencies there with keeping things focused and in the L2 rather than swapped out to any of the lower and more latency inducing cache/memory levels.
The Infinity Fabric would be more to allow any Epyc CPU coherency traffic to bypass the lower levels of memory and transfer to Vega’s HBC, or directly to Vega’s L2, any cached data directly from any Epyc CPU cache directly to any Vega cache level where there is processing work(FP, INT, Other values) being dispatched from the Epyc CPU to the Vega GPU that both CPU and GPU are working on. It could even be that there is no data movement at all cache to cache but just some coherency signaling Zen/Epyc CPU to Vega GPU that invalidates a some data held in one processors cache because that data is now out of date and has been changed and needs to be flushed so the proper update data can be fetched or transferred over. This also applies to any Vega GPU to Vega GPU Cache/coherency trafic, or even Vega to DSP, If the DSP speaks the Infinity fabric protocol.
“AMD Infinity Fabric underpins everything they will make”
https://semiaccurate.com/2017/01/19/amd-infinity-fabric-underpins-everything-will-make/
Can you show GPU utilization
Can you show GPU utilization during the CFX runs ?
CFX can be very hard to set up properly. Especially if you don’t do test runs with it every 1-2 weeks…
Every driver update behaves differently… also game updates can change certain graphic option behaviours…
I suspect not 99% / 99% utilization here in those titles as well due to engine but also due to configuration missing out.
– LeeDoo
While your comment makes some
While your comment makes some sense, there is literally ONE driver that works for Vega FE today, so what changes every 1-2 weeks is irrelevant.
that’s silly. obviously 2 x
that’s silly. obviously 2 x vega fe would walk up to a 1080 ti and slap it in the face. then proceed to slap every single member of its family in the face.
the difference in power would be too enormous.
this is all weird upon weird and I don’t see anyone doing any meaningful investigations or comparisons. that opportunity will be gone once AMD does launch rx vega. pc hardware media be disappointing me.
At the moment VEGA drivers
At the moment VEGA drivers are “not-ready”. GPU is not utilized, FP16 not used (yet), tiling etc. You can try and check with new CodeXL profiling and/or DX12 PIX. They use like ~65% VEGA cores (FP32 data path). AMD should be ~1080TI when they will hit 95%+ and much faster if FP16 can be used for some tasks. Lets just hope that their tools/drivers teams are working hard, and that we devs can get their hands on VEGA FE, so that we will have nice product and games optimized for it (some are already on the way).
fp16 is not used in games
fp16 is not used in games mostly. won’t affect much if its working unless they can get games to start including it
There are places where game
There are places where game devs can use it. Also AMD can do this optimizations in drivers or actually shader compiler. Both NV and AMD already do this for most high profile games/apps for their other cards – it’s normal. For VEGA not yet, and as for devs, they will also do it, but it will take time and learning. And not all devs do it. Some just stop and deploy app when it can hit 90/60/30 fps on selected target cards/systems. That’s the real world. It would be awesome if reviewers would be able to see perf counters and measure not only fps, frame time in ms but also utilization of various parts of gpus. There is so much left on the table… that kind of reviews maybe would push both – gpu vendors and devs to write better games/apps.
problem is our current games
problem is our current games is more complex that how it was before (like ten year ago). some graphical effect simply needs to be done in FP32 according to developer. the last time AMD try using FP16 they can’t really do it without affecting the image quality of the entire scene. and they only enable them in farCry benchmark mode to get better score and disable them when you actually play the game.
mixing FP32 and FP16 together will need more attention to the optimization or else there is no saving at all by going FP16. the only question is with triple A games release that always being rushed by publisher will developer have the time for it?
“…and you can see why
“…and you can see why multi-GPU is going to get less attention from AMD here.”
This could NOT be further from the truth. AMD designed MANTLE to scale gpu’s natively. It is called Explicit Multi Adaptor. In fact AMD supports up to 4x GPU cards in Mantle and Microsoft followed with their Mantle Clone DX12 and of course Vulkan does as well.
Crossfire is ONLY necessary using the obsolete API DX11.2 as multi-cards are natively supported in DX 12, Mantle and Vulkan with Explicit Multi Adaptor.
In Dx12 the scaling of Radeon AIB is virtually 1:1. Two cards will just about double the performance of one card with DX12 supported games. Two RX 480 running DX12 equals or beats GTX 1080 for far less $$$$.
By Christmas 2017 90% of ALL new titles will support DX12.
Why is the OBSOLETE DX11 even a consideration? Are enthusiasts going to spend $1800+- for a 2x multi gpu system just so they can run DX11 Legacy supported games?
Why not see how well it runs DX9 and DX10 games while you are at it.
Talk about irrelevant.
Unless of course you wanted to show VEGA in a poor light. nVidia does not bench well using DX12.
nVidia does not support Asynchronous Compute and Asynchronous Shader Pipelines except through software or driver emulation. Asynch Compute is AMD hardware IP.
So I challenge you to benchmark VEGA using DX12, and Mantle and VULKAN and DO NOT DISABLE ASYNCH COMPUTE. ALso you might want to use 3dMark DX 12 benchmarks as well as Star Swarm.
DX11 is broken. Get with the program.
I honestly can’t tell if this
I honestly can’t tell if this guy is trolling or just had way too much coffee today.
The guy is legitimately upset
The guy is legitimately upset that PCPER dared benching crossfire instead of just quoting AMD’s own promo material like he has.
I mean who needs real-world numbers?
“I mean who needs real-world
“I mean who needs real-world numbers?”
So you can show us some real world numbers regarding DX 12 Benchmarks?
So what is the big deal?
Why ignore DX12 benchmarks with the latest AMD GPU’s?
Lazy? Don’t care? You have $1800 or so of the latest Radeon kit and you ignore the latest API.
Star Swarm is FREE. No guts, no glory.
Another web media site PAID by INTEL and nVidia.
No one gives a fuck about
No one gives a fuck about retarded tech demos. They care about the games that they spent money on.
Are you braindead?
Who in their right mind
Who in their right mind spends $1800 on two Open CL workstations card to play games, they are for work, (yes some of us still do it) rendering 10bit video etc. I will used them as such in my Mac Pro. If I want to pay games I will use my Windows PC, as windows OS is best for Gamers and Secretaries atm (Until more Vulkan API games hit the market) .
Now AMD have looked to the future, they realised long ago that the thermal threshold of a CPU core was around 5GHz, and that DX11 was only using a single core was a huge limiting factor for gaming, holding performance back, so they gave us Multi core CPU’s, developed Mantle low level API, that spawned DX12, and Vulkan. And built GPU’s that could perform asynchronous compute tasks to take advantage of the CPU’s and API’s, The only thing holding back performance is developers poor application and optimisation of the aforementioned API’s.
Some of you people seem to want to hold back innovation, and play on single core DX11 for the next 20 years. we need to reward the game developers that move us forward with Vulkan, it’s a cross plaform API that should be the API for all future games.
DX12 = Windows 10 = meh
DX12 = Windows 10 = meh
https://www.pcper.com/reviews
https://www.pcper.com/reviews/Graphics-Cards/AMD-Radeon-Vega-Frontier-Edition-CrossFire-Testing/Hitman-
Here’s a Dx12 game, Did you read the article?
No one is asking pcper to
No one is asking pcper to reinvent the wheel, but why on earth would anyone review a crossfire setup with games known to not use it well. Test AC:unity etc etc. So many games actually benefit from it. I appreciate the effort and sorry for my tone, but this was useless. Why not sell one of the Vega FE and use the cash on updating your library with some actual relevant games.
The games he used are
The games he used are relevant. So you want him the cherry pick games that may make 2 Vegas look good. Now that is something. Good job Mr. Shrout.
Nice Test. But you didn’t
Nice Test. But you didn’t test the Rasterizer. Main feature Tiled Based Rasterizer is not on!
Like Trianglebin shown 😉
We did that during our Vega
We did that during our Vega Frontier Edition live benchmarking stream.
Thank you Ryan, i knew that
Thank you Ryan, i knew that you did trianglebin test. I hoped you dived a little bit deeper at the Rasterizer behavior.
I have seen in another post that you have asked some expert which mean that the improvement of Tiled Based Rasterizer is 10%.
I’m a little bit surprised. Nvidia hat an ipc improvement of 35% between Kepler and Maxwell with the tiles based Rasterizer.
If you think about it. You save double Performance with TRB. You don’t argue the shader with unimportant workload. Because of this you get also capacity back from the shader. The shader wich done unimportant work before are now free to do important work.
Also did you remember your article about Deus x and the 220Million Triangles where are only 2 billion are viewed.
That’s the advantage of Tiled Based Rasterizer.
https://www.pcper.com/reviews/Graphics-Cards/AMD-Vega-GPU-Architecture-Preview-Redesigned-Memory-Architecture/Primitive-Sh
But thank you for your investigation and that you listen to your community. I honor this!
“I’m a little bit surprised.
“I’m a little bit surprised. Nvidia hat an ipc improvement of 35% between Kepler and Maxwell with the tiles based Rasterizer.”
majority of the performance improvement coming from the rearrangement of the SM. nvidia already explain this when they first coming out with 750ti. for TBR in maxwell we don’t even know about them until David Kanter made his test last year.
TBR probably can improve performance but not in the way that some people imagine. if TBR is superior then why ATi and nvidia did not use them before? Imagination Technologies for example have been using some form of TBR (when others not) even when they still competing in desktop market more than a decade ago so why they did not the best GPU maker on desktop right now?
nvidia only discover the importance of TBR when they try competing in the mobile market with tegra. TBR is more common on mobile GPU because of power and bandwidth constraint. but it did not dramatically increase GPU performance (in term of FPS) like some people believe.
Nice, I have not seen any
Nice, I have not seen any OpenGL games reviewed on Vega FE yet given the good R15 scores it would be an interesting test to highlight if possibly the DX/Volcan drivers are just not up to scratch yet. Doom has an OpenGL mode does it not? Normally the Volcan mode significantly outperforms the OpenGL mode (on both Nvidia and AMD cards) so would be an interesting test to see if more work has gone into the OpenGL driver stack for the (proish) card.
Go to the Phoronix website as
Go to the Phoronix website as that’s where the Majority of the OpenGL Linux games testing is done. And Michael Larabel very often tests OpenGL’s performance on Linux against OpenGL’s performance on Windows for games and other graphics software.
Michel has been remote testing via Linux on a Phoronix reader’s Vega FE using the Phoronix Test Suite. There is some OpenCL testing and maybe Michael will get more remote access for OpenGL/Vulkan testing.
“A Few OpenCL Benchmarks With Radeon Vega Frontier Edition On Linux”
http://www.phoronix.com/scan.php?page=news_item&px=Radeon-Vega-FE-Linux-OpenCL
From my experience with Time
From my experience with Time Spy and Ashes of the Singularity, DX12 requires Crossfire to be disabled in driver, as it uses Explicit Multi GPU instead
Exactly, Crossfire is not
Exactly, Crossfire is not needed with DX12. EMA uses all available GPU resources.
The whole point of EMA is within a few years, GPU dies will display lower yields as they become ever so larger on transistor and core counts. Do you really expect to see 10,000 shader core GPU’s?
The only solution is multi-chip GPU cards and an API that can take advantage of scalable design.
Both Vulkan and DX12 and the
Both Vulkan and DX12 and the graphics APIs managed Explicit Multi-Adaptor for GPU and the games developers/Gaming Engine developers can create their own optimized Libraries that are optimized for mult-GPUs via the games/gaming engine.
CF/SLI are not so good at multi-GPU load balancing inside of AMD’s or Nvidia’s respective drivers. So get the multi GPU load balancing out of the drivers and into the APIs and let the entire gaming/graphics software industry optimize for multi GPU usage. Keep the drivers as simple, light weight, and close to the GPU’s matal as possible and let the games developers do multi-GPU via the games/gaming engine’s SDKs that can call on Vulka’s or DX12’s EMA. That way the entire gaming, and graphics software industry, can pool their resources and get the mulit-GPU scalability issues solved.
but the problem is majority
but the problem is majority of game developer simply did not want to deal with multi GPU.
So 560w dual 16GB Vega is
So 560w dual 16GB Vega is slower, significantly slow (by 20+%) then a single 180w 8GB 1080 at 4K…
Did you guy confirmed those driver are VEGA crossfire optimized ?
If not… what was the point of this ?
Seems like a lot of effort for something pointless.
Maybe try to investigate voltage scaling ?
Ram frequency effect on the Vega architecture ?
How does vega behave VS Fiji ?
Does any of the new feature impact performance ? (geometry, compute)
etc.. etc…
Most enthusiast do not run GPU at stock. So is there any ways, any ways at all Vega FE can deliver better gaming performance ?
higher fan profile ? power limit + 10% ?
lower voltage ? lower feqeuncy? (Do you guys recall of the reference RX 480 was actually faster under clocked.)
etc.. etc..
This was a really cool
This was a really cool review. Kudos to PCPer for doing this. I appreciate it. I would have liked to see 4-way 480 or 4-way 580 when they came out because they were cheap and it was possible and that would have been cool too. This kind of testing isn’t to suggest that people should actually go out and buy it, but just cool too see for testing if you’re into computer hardware. This is why I follow PCPer.
I feel like you may not have
I feel like you may not have updated Hitman to the latest version. On my copy, the settings menu has an extra option (enable multi gpu – dx12) and the mgpu only works if you have that option enabled. The only reasons you wouldn’t see that option are 1. Not updated Hitman fully or 2. Mgpu disabled in the driver somehow.
Pointless test on a non
Pointless test on a non gaming card. When the gaming card is released then do a gaming test. I guess anything to get views.
In case you haven’t heard
In case you haven’t heard this response to the other eleventy bazillion times someone has said that – this IS the same silicon as RX, running a gaming driver, and the according to Nvidia ‘prosumer not gaming’ Xp this prosumer-not-gaming card is positioned against is nevertheless bloody good at gaming. FE’s clocks look like being lower than RX, and the drivers are clearly unoptimised, as this review shows: Vega crossfire right now is completely AWOL
@Ryan Shrout, Did you turn on
@Ryan Shrout, Did you turn on the Shader Cache when testing all the games? Thanks.
Damn….. this is truly to
Damn….. this is truly to bad to be honest. I was hoping for something a bit more “special” than all of this. Mainly in the performance of these cards, but especially when you read an “excuse” before the actual review. Some nonsense about how MGPU scaling is an abysmal thing and fading to the wayside nonsense…. Seriously couldn’t be more from the truth.
Damn API’s are/have been created with MGPU in mind as of late and are supposed to get better….New motherboards still offering up to six x8-x16 PCI-E capable slots.
I believe it to be the laziness of late and or the coercing of the developers.
There’s nothing truly better than being a PC gamer and being able to put more money into something to get a beneficial gain. Every FPS can be an advantage….
With Vega currently
With Vega currently performing in-line with an up-clocked Fury, I’d expect Crossfire scaling for Rx Vega (once for-real-this-time drivers are available) to be in line with Fury’s scaling.
I’d love to see frame times
I’d love to see frame times where the GPU is not thermally or power limited (for either side). I found hitting a limit destroys frame times on my 1080 SLI. I’m curious how many spikes are caused by hitting limits vs. actually inherent to SLI or crossfire.
I might be stupid here, but
I might be stupid here, but why does GTA V show only 6 GB of video memory if you’ve got 2 x 16 GB cards?