AMD Polaris Architecture Coming Mid-2016
AMD is giving is a very brief glimpse of what to expect for its Polaris GPU architecture.
In early December, I was able to spend some time with members of the newly formed Radeon Technologies Group (RTG), which is a revitalized and compartmentalized section of AMD that is taking over all graphics work. During those meetings, I was able to learn quite a bit about the plans for RTG going forward, including changes for AMD FreeSync and implementation of HDR display technology, and their plans for the GPUOpen open-sourced game development platform. Perhaps most intriguing of all: we received some information about the next-generation GPU architecture, targeted for 2016.
Codenamed Polaris, this new architecture will be the 4th generation of GCN (Graphics Core Next), and it will be the first AMD GPU that is built on FinFET process technology. These two changes combined promise to offer the biggest improvement in performance per watt, generation to generation, in AMD’s history.
Though the amount of information provided about the Polaris architecture is light, RTG does promise some changes to the 4th iteration of its GCN design. Those include primitive discard acceleration, an improved hardware scheduler, better pre-fetch, increased shader efficiency, and stronger memory compression. We have already discussed in a previous story that the new GPUs will include HDMI 2.0a and DisplayPort 1.3 display interfaces, which offer some impressive new features and bandwidth. From a multimedia perspective, Polaris will be the first GPU to include support for h.265 4K decode and encode acceleration.
This slide shows us quite a few changes, most of which were never discussed specifically that we can report, coming to Polaris. Geometry processing and the memory controller stand out as potentially interesting to me – AMD’s Fiji design continues to lag behind NVIDIA’s Maxwell in terms of tessellation performance and we would love to see that shift. I am also very curious to see how the memory controller is configured on the entire Polaris lineup of GPUs – we saw the introduction of HBM (high bandwidth memory) with the Fury line of cards.
It looks like the mobile variations of Polaris, at least, will be using standard GDDR5 memory interfaces. AMD didn’t comment more specifically than that, only elaborating that they “were invested in HBM” and that it would “continue to be on the company’s roadmap.” To me, that sounds like we’ll see a mix of products with HBM and GDDR5 technology, even in the desktop market. Likely, only more expensive, flagship graphics cards will be getting HBM.
Performance characteristics were missing from any discussions on Polaris with one exception – a demonstration of power efficiency of an unnamed Polaris GPU compared to a GeForce GTX 950 GPU. Maxwell was a big step forward for NVIDIA in terms of power efficiency and AMD is hoping that Polaris, along with the FinFET process technology, will offer an even bigger jump.
For the single data point that AMD provided, they compared the GTX 950 to a similarly priced Polaris graphics. At 1920×1080 resolution, run medium image quality settings at 60 FPS, full system power for Star Wars Battlefront was around 140 watts on the NVIDIA system. The Polaris system was using just 86 watts. That’s a difference of 54 watts, a considerable amount for GPU in this class.
It is likely that this is the first Polaris GPU being brought up (after only 2 months I’m told) and could represent the best improvement in efficiency that we will see. I’ll be curious to see how flagship GPUs from AMD compare under the same conditions.
How is Polaris able to achieve these types of improvements? It comes from a combination of architectural changes and process technology changes. Even RTG staff were willing to admit that the move to 14nm FinFET process tech was the majority factor for the improvement we are seeing here, something on the order of a 70/30 split. That doesn’t minimize the effort AMD’s engineers are going through to improve on GCN at all, just that we can finally expect to see improvements across the board as we finally move past the 28nm node.
AMD did spent quite a bit of time at the RTG summit discussing the benefits of moving to FinFET technology from the current 28nm planar technology that dominates GPU production. It’s been 5 years since we saw 28nm GPUs, BY FAR the longest stretch between node upgrades. There were just too many technical and availability issues with 22/20nm processes for either NVIDIA or AMD to adopt them. But, with FinFET 3D transistors at the 16nm and 14nm nodes, we should see sizeable improvements in power efficiency provided by its ability to have a higher “on” current and lower “off” current.
FinFET isn’t new, it’s just new to anyone other than Intel. We have been writing about 3D, FinFET transistors since 2012, with theories about its implementation and advantages for GPUs. I did learn that the first Polaris chip we saw demoed in December was built on GlobalFoundries 14nm FinFET process; whether or not AMD will also build Polaris chips on TSMC’s 16nm process has yet to be confirmed.
FinFET technologies not only offer very obvious product-level performance advantages but also lower variation. This should allow AMD (and NVIDIA) to better predict bins for this generation of GPUs. This means RTG should have no issues planning out their product line well in advance.
This chart, above, illustrates the advantages of FinFET in curves mapping power consumption and performance (clock speed). AMD’s Joe Macri stated, during our talks, that they expect this FinFET technology will bring a 50-60% power reduction at the same performance level OR a 25-30% performance increase at the same power. In theory then, if AMD decided to release a GPU with the same power consumption as the current Fury X, we might see a 25-30% performance advantage. I think, in practice, they would lower voltages a bit instead, to improve overall efficiency more dramatically, but we’ll find out in mid-2016.
One thing we will definitely see from Polaris is competition for NVIDIA in the mobile space, with Radeon GPUs finally making inroads to modern gaming notebooks. And we might be lucky enough to see further changes in the form factors available with high performance GPUs, much like we did with the Radeon R9 Nano.
But honestly, other than what we have here, very little is known about what Polaris will be and what it will offer PC gamers when it is finally released (later this year). Will AMD release mobile and mainstream GPUs first and follow up with flagship high-end parts? Or will we see the more expensive cards first in our review lineup? We don’t know and AMD and the Radeon Technologies Group aren’t talking yet. It is also going to be crucial to learn how many architecture changes will find their way to Polaris and how effective they are in change the performance efficiency for Radeon. NVIDIA’s GeForce line will be built on the same FinFET process later in 2016 so any inherent advantages of that process will apply to both parties.
Check out the video from AMD if you'd like as well.
I’m guessing (without any
I’m guessing (without any information to base this on) that both teams (red and green) will launch smaller die GPU’s first. However, these smaller dies may offer similar or better performance than current mid-range cards.
If they do replace the
If they do replace the smaller die you know it better be the Tonga GPU then all of amd’s lineup will support free sync.
the GCN 1.0 radeon 7850/370
the GCN 1.0 radeon 7850/370 to be replaced by a GCN 4.0 Tonga 460 would be most welcome
It’s rather standard practice
It’s rather standard practice now to release a new architecture with a GPU that’s faster than the top-end product currently. Like the GTX970/980 then later then 980Ti.
There are practical reasons such as process manufacturing process maturation, and getting enough usable chips to release a top-end product.
Having said that, it boils down to PROFITS so it depends on what the failure rate is relative to die size and other factors.
AMD have form for starting on
AMD have form for starting on a new process with smaller (low or mid range) chips.
In this case it makes sense – their lower end stuff is most in need of replacement, this is a new node so smaller means reduced die area lost to defects, and mobile is a key area they need to be aggressive in.
Considering how big of a jump
Considering how big of a jump in process tech and design that this should be, a mid-range sized chip will compete with much larger 28 nm planar chips. This should be a bigger jump than previous process node transitions since 20 nm was skipped and the switch to finFET. I don’t know if we will get Fury performance in a mid-range die size, but it isn’t impossible.
The Fury is 596 square mm. The Radeon R9 390 is 438. The 380 is 359 and the 370 is 212. Sizes from Wikipedia. Perhaps we will get a chip in the 200 to 300 square mm range. Jumping directly to a ~600 square mm die on an immature process would be a bad idea, unless yields are unusually good. They could go with an even smaller die part for the initial release though. The 370 is still based on GCN 1.0, so it needs to be replaced. A much smaller die could achieve the same performance, so the first release could actually be a small die part, maybe in the 100 square mm range. This could cover a wide range of products by using the greater ability to scale performance at the cost of power with the new process.
A small die chip will also be great for mobile. Intel seems to be having some issues with 14 nm Skylake processors with the high clocked parts in short supply. This leaves a lot of low clocked parts for mobile though. With a small die part, they could replace the 370 and they could used salvaged, cut down parts to replace some of the lower end (350 and below which are also GCN 1.0) and OEM parts. Fully functional parts that just can’t clock very high would be good for mobile. With a new process they could have a lot of salvage. This would get all of the 1.0 parts out of their line-up. 360 and the 390 are still 1.1 though. The 360 could be replaced by a small die part. I would expect they would want discontinue the 390; this is a large die part to be selling as cheap as they are. It is on an old process that would be in lower demand, so perhaps it isn’t that expensive to keep on making them for a while yet.
I am leaning towards a ~100 square mm part for the initial release, or at least less than 200 square mm.
Well, AMD is improving their
Well, AMD is improving their fully in hardware GPU processor thread dispatch/management circuitry, and that Primitive discard accelerator sounds very interesting with probably an even better ability to more efficiently discard unneeded draw/other calls for hidden geometry among other steps to make sure that only what is absolutely necessary for visually displaying the necessary parts of the game is actually worked on. An all new geometry processor sounds nice, but will there be more tessellation resources, or will the improved hardware be able to do more with less tessellation units.
For those SKUs that still use GDDR5, I hope that they are mostly for the Desktop market, and not for the mobile SKUs that are using Polaris because AMD needs to get more of its GPUs/APUs into laptop SKUs, even if that means only using HBM1 on its mobile SKUs, as even HBM1 would be a step up from GDDR5. I can see AMD only having HBM2 on its flagship desktop SKUs for a while, but the laptop market needs some more competition and even an APU with just HBM1, and an HBM1 based discrete mobile SKU would get AMD some design wins. HBM1 is still much more power efficient than GDDR5 and would allow for a much simpler/smaller mainboard once HBM/HBM2 fully becomes the standard.
The laptop/mobile market needs HBM memory’s space/power savings much more than the Desktop market for mid range discrete GPUs needs HBM. AMD needs an attractive HBM based APU for maybe getting some business from Apple, although if Apple chose to go with AMD, Apple could fund such a development process with its financial resources much like the console makers fund AMD’s development of their exclusive use SKUs.
I really wish that AMD would try to at least make a flagship discrete mobile SKU with HBM2 for gaming laptops, before that Zen based APUs/Polaris graphics GPU on an interposer makes its appearance in the consumer market and maybe makes mobile discrete SKUs unnecessary for laptops based on AMD’s interposer based APUs. The interposer based APU is not far off from becoming a reality in the consumer market.
Nvidia has yet to get its GPU processor thread dispatch/management circuitry implemented fully in hardware, and AMD is already improving their fully in hardware asynchronous compute circuitry for its Polaris micro-architecture based GPUs.
Apple already uses AMD gpus
Apple already uses AMD gpus in all of their (non-Intel IGUs) laptops and desktop models. They don’t need more business from Apple when they already have all it.
Plus why do they need HBM when their current iteration of (Small) Polaris only uses about 35watts of power in a throttle case scenario. As well as having similar performance to a desktop gtx950?
I would expect HBM to only be
I would expect HBM to only be for high end desktop parts unless it gets to a point where going HBM is cheaper than GDDR derived memory. Companies like Apple may be willing to pay for HBM for the smaller size and/or power consumption advantages for mobile. Mobile chips cost more, and having everything on one interposer will reduce power consumption significantly. A mid-range GPU doesn’t really need HBM levels of bandwidth though, so without the power consumption and space constraints needed for mobile, I would expect the low-end and some mid-range desktop parts to stay based on GDDR5 variants. I would like to see an HBM based APU. Wccftech had an article about an APU with a single HBM stack which could supply 128 GB/s. This wouldn’t be a gaming laptop though. Considering powerful (and power consuming) GPUs are going into laptops, I don’t see why we would not get a mobile GPU with HBM. The new process along with HBM should make a very power efficient GPU for mobile. The initial release though may be a small die part with GDDR5.
it seems that AMD will use
it seems that AMD will use HBM1 for APUs but the discrete GPUs for laptops and the low-end models for desktops might be based on GDDR5
Some HBM APU`s would be
Some HBM APU`s would be nice… with the graphics performance of a 960 or so and no need for cpu ram.
Is displayport 1.3 freesync
Is displayport 1.3 freesync supporting by default? Or will be be in 10 different versions again?
Adaptive Sync is still
Adaptive Sync is still voluntary with 1.3.
Remember, FreeSync is AMD’s way of supporting Adaptive Sync.
This is what it says on
This is what it says on Wikipedia:
“As of 2015, VESA has adopted FreeSync as an optional component of the DisplayPort 1.2a specification.[4] FreeSync has a dynamic refresh rate range of 9–240 Hz.[3] As of August 2015, Intel also plan to support VESA’s adaptive-sync with the next generation of GPU.[5]”
It is still optional component, so a device can still support display port 1.3 without implementing it. It is unclear what elements of FreeSync were added to the standard beyond what was already included as adaptive sync. Since some elements of FreeSync were apparently included in the standard as adaptive sync, there is little to no semantic difference between saying FreeSync or adaptive sync right now. I don’t think FreeSync has any features beyond what was standardized as adaptive sync, so FreeSync is just AMD’s marketing name for adaptive sync (AMD has a trademark on “FreeSync”). I guess you could consider frame multiplication as part of FreeSync which is not part of the standard though. It is very common for people to start using a trademarked name to refer to everything of that type. This may not be that clear of a distinction as other cases though due to FreeSync elements being added to the adaptive sync standard.
Saying that it is AMD’s way of implementing it seems incorrect since FreeSync or adaptive sync have referred to interface standards, not the implementations. G-sync is not a standard so it refers more to the implementation. Makers of G-Sync displays do not have to adhere to a standard for G-sync, they just buy the G-sync module. A display which supports adaptive sync as part of the VESA standard is a FreeSync display as currently defined. Since FreeSync is AMD’s trademark, they can use it to refer to anything they want though, so this is all semantic.
Given AMD’s PR history of
Given AMD’s PR history of promising insane gpu but turns in to not as good as claimed. I wouldn’t take this as face value til its proven by independent reviewers.
Really? PR should be taken
Really? PR should be taken with a grain of salt? Whatever would we do without your deep insight?
What baffles me is that the
What baffles me is that the GTX950 is approximately has about 2x perf-per-watt vs current R9-300 series.
If we say 50% power savings for example (see article) then that brings us to parity with NVidia yet they show an 86W GPU supposedly matching the GTX950 that’s using 140W.
Even using HBM which they likely did (but will they at this price point in commercial products?) the power discrepancy seems to high.
I don’t doubt the numbers but it feels like an extremely cherry picked result. For example, maybe they used a GPU with much more transistors but dropped the voltage to the “sweet spot” to optimize perf-per-watt. Not in a way they’d sell, but rather just to beat NVidia.
Finally, they’re obviously comparing their upcoming product to something NVidia already has on the market at 28nm. For people that don’t know this is very misleading. They say “the competition” but it’s not an apples-to-apples comparison.
When NVidia launches its products will they still be pointing to this comparison?
I’m just so skeptical about AMD as they’ve been very deceptive about previous GPU performance, Freesync (saying they had superior “9Hz to 244Hz” for example which was a paper spec) etc.
Anyway, I wish them well but they’re pissing me off.
I don’t think anyone but tech
I don’t think anyone but tech enthusiast would even be aware of these power comparisons. I would hope that most of them would understand that they are comparing their unreleased product to an existing product. How else would you get the power savings across though? You would expect big power savings with a jump from 28 nm planar to 16 nm FinFET, but power consumption is obviously a limiting factor, so a lot of engineering goes into reducing it. The power savings is mostly going to be from the process tech, but some will be from design, as the article notes. It annoyed me when Nvidia was praised so much for the power consumption of Maxwell when a lot of that was probably from cutting 64-bit performance to 1/32 of 32-bit. AMD was still at 1/8 at the time, I think.
Free sync specifications are
Free sync specifications are for the interface only. There is a big difference in the speed of a USB 3.0 interface and the speed actually achievable by something like a USB 3.0 connected hard drive, or any device connected to that interface. This isn’t a “paper spec”; we just don’t have any devices that can do 244 Hz yet. We might be able to get there with some OLED or other tech, just like a new SSD could probably saturate a USB 3.0 interface. If you don’t understand the difference between the specs of an interface and the specs of the device connected to it, then your anger at AMD is misplaced.
Misdirected rather than
Misdirected rather than misplaced is what I meant to say.
And the spin doctor arrives
And the spin doctor arrives with the usual FUD BOMBS!
Yep, it just wouldn’t be an
Yep, it just wouldn’t be an AMD article without Arbiter telling people that it will fail.
There has been amazing
There has been amazing amounts of FUD in the comments for AMD articles. Hopefully this is a sign that the competition is feeling threatened.
What history? The one time
What history? The one time that happened with Fury?
Personally I’d add Bulldozer
Personally I’d add Bulldozer and to a lesser extent Mantle.
Gtx 950 is a 90w tdp card.
Gtx 950 is a 90w tdp card. I’m curious how it pulls 140 watts in this test. Sounds like they used an OC model or something..
Yes I know tdp isn’t akways equal to max power but that’s too much of a diff..
It’s total system power, not
It’s total system power, not GPU power.
140W for the Nvidia PC, 86w for the AMD PC. All other pieces equivalent.
Duh I wasn’t awake when I
Duh I wasn’t awake when I read the article. I was for sure convinced it said ‘card power’ not system power :). Thanks!
Meaning AMD Polaris GPU is
Meaning AMD Polaris GPU is using 36W and nVidia Maxwell 2 is using 90W. It means Polaris 14nm GPU is 2X more Performance/Watt and 2.5X against AMD`s own 370 WOW
RIP nvidie
OK Joking aside, I fink RTG/Polaris will defo kill nvidia Pascal on hardware features
Again with the slides and we
Again with the slides and we know how well that has gone for them in the past….Until I see the actual hardware and lol software drivers being tested and reviewed by independent reviewers I don’t give a shit.
“I don’t give a shit.”
Yes
“I don’t give a shit.”
Yes you do.
You give enough of a shit to read the article. And even if you didn’t bother to read the article and just came to trash talk, you gave enough of a shit to see an AMD article and come to the comments to trash talk.
Trash talk. About something you literally know nothing about, because nobody else knows anything about it, you still cared so much that you felt it necessary to take to the comments and express your displeasure. As if your very manlihood was threatened by the notion that AMD may well have something awesome coming, you rose to the defense of your gawd and struck out at your enemies with feigned indifference.
So you obviously give a shit.
Lol, yeah he gave so much of
Lol, yeah he gave so much of a shit to come back to reply to you too right?
AMD fanboys FTW! xD
Like any rational thinking person would do with anything AMD says, wait for actual product release and tests from reviewers period.
You dont see nvidia parading around with Pascal do you, no because they have nothing to fear and are probably laughing at these slides again 😀
I would have been even more
I would have been even more surprised if he DID come back to reply. And I would have used that against him.
Your attempt to paint me as an AMD fanboy, however, was just sad. Feel free to read my comment again, and you’ll find that I never once made any claims about how good (or bad) AMD’s new architecture will be. While I do have a tendency to prefer AMD graphics products (based on my own personal experiences after ten years of Nvidia exclusivity, followed by four years (and counting) of trying out AMD), I’m also rational and logical enough to know that, at this point in the development stage, it’s impossible to actually know how good (or bad) it will be, and it’s pointless to try to make any claims in either direction.
Mocking someone for being an Nvidia fanboy does not automatically make one an AMD fanboy. For all you know, I may well just be anti-fanboy.
And by the way, Nvidia doesn’t have to “parade around” with Pascal. They have an enormous army of fanboys who have been doing that for them – for free – for months now. Like you, for example, mister “they have nothing to fear and are probably laughing at these slides again”. Or, perhaps, mister “coming to a comment thread on a two-day-old article buried back on the second page just to fanboy at people”.
You’re a fanboy nonetheless
You’re a fanboy nonetheless moron, hypocrite.
When you can’t make a
When you can’t make a rational argument, never fail to resort to personal insults.
Sorry kid, you lose.
The original “I don’t give a
The original “I don’t give a shit” commenter gave enough shit to read & comment, so he’s obviously lying and/or is delusional.
As for your comment, it belongs at Wccftech where fanboys like you are much more at home than here. Now shush, boy.
^ding
^ding
like dinging yourself lmfao
like dinging yourself lmfao
AMD look like they are going
AMD look like they are going to have a strong 2016. I’ll be buying all their new kit this year 😀
Lol at all you ridiculous AMD
Lol at all you ridiculous AMD fanboys, hyping up slides again!
Get ridiculed & mocked more and more why don’t you all, bunch of armchair losers….
Too bad PCPER got infected with these scumbags ever since the frametiming came to light, what a shame…
Go back to WCCFtech, where
Go back to WCCFtech, where your blind fanboy bullshit is not only accepted but encouraged. It’s a veritable cornucopia of Nvidia-fanboy-echo-chamber over there, it’s perfect for you.
You’re welcome to come back to PCPer when you’re a mature adult, but WCCFtech just loves to cater to screaming crying puddles of preteen Nvidia fanatics like you.
Here, the adults are trying to talk.