Some Fresh Hope for 2016
Someone has leaked more AMD slides from the May 6 analyst day
EDIT 2015-05-07: A day after the AMD analyst meeting we now know that the roadmaps delivered here are not legitimate. While some of the information is likely correct on the roadmaps, they were not leaked by AMD. There is no FM3 socket, rather AMD is going with AM4. AMD will be providing more information throughout this quarter about their roadmaps, but for now take all of this information as "not legit".
________________
SH SOTN has some eagle eyes and spotted the latest leaked roadmap for AMD. These roadmaps cover both mobile and desktop, from 2015 through 2016. There are obviously quite a few interesting tidbits of information here.
On the mobility roadmap we see the upcoming release of Carrizo, which we have been talking about since before CES. This will be the very first HSA 1.0 compliant part to hit the market, and AMD has done some really interesting things with the design in terms of performance, power efficiency, and die size optimizations. Carrizo will span the market from 15 watts to 35 watts TDP. This is a mobile only part, but indications point to it being pretty competent overall. This is a true SOC that will support all traditional I/O functions of older standalone southbridges. Most believe that this part will be manufactured by GLOBALFOUNDIRES on their 28 nm HKMG process that is more tuned to AMD's APU needs.
Carrizo-L will be based on the Puma+ architecture and will go from 10 watts to 15 watts TDP. This will use the same FP4 BGA connection as the big Carrizo APU. This should make these parts more palatable for OEMs as they do not have to differentiate the motherboard infrastructure. Making things easier for OEMs will give more reasons for these folks to offer products based on Carrizo and Carrizo-L APUs. The other big reason will be the GCN graphics compute units. Puma+ is a very solid processor architecture for low power products, but these parts are still limited to the older 28 nm HKMG process from TSMC.
One interesting addition here is that AMD will be introducing their "Amur" APU for the low power and ultra-low power markets. These will be comprised of four Cortex-A57 CPUs combined with AMD's GCN graphics units. This will be the first time we see this combination, and the first time AMD has integrated with ARM since ATI spun off their mobile graphics to Qualcomm under the "Adreno" branding (anagram for "Radeon"). What is most interesting here is that this APU will be a 20 nm part most likely fabricated by TSMC. This is not to say that Samsung or GLOBALFOUNDRIES might be producing it, but those companies are expending their energy on the 14 nm FinFET process that will be their bread and butter for years to come. This will be a welcome addition to the mobile market (tablets and handhelds) and could be a nice profit center for AMD if they are able to release this in a timely manner.
2016 is when things get very interesting. The Zen x86 design will dominate the upper 2/3 of the roadmap. I had talked about Zen when we had some new diagram leaks yesterday, but now we get to see the first potential products based off of this architecture. In mobile it will span from 5 watts to 35 watts TDP. The performance and mainstream offerings will be the "Bristol Ridge" APU which will feature 4 Zen cores (or one Zen module) combined with the next gen GCN architecture. This will be a 14nm part, and the assumption is that it will be GLOBALFOUNDRIES using 14nm FinFET LPP (Low Power Plus) that will be more tuned for larger APUs. This will also be a full SOC.
The next APU will be codenamed "Basilisk" that will span the 5 watt to 15 watt range. It will be comprised of 2 Zen cores (1/2 of a Zen module) and likely feature 2 to 4 MB of L3 cache, depending on power requirements. This looks to be the first Skybridge set of APUs that will share the same infrastructure as the ARM based Amur SOC. FT4 BGA is the basis for both the 2015 Amur and 2016 Basilisk SOCs.
Finally we have the first iteration of AMD's first ground up implementation of ARM's ARMv8-A ISA. The "Styx" APU features the new K12 CPU cores that AMD has designed from scratch. It too will feature the next generation GCN units as well as share the same FT4 BGA connection. Many are anxiously watching this space to see if AMD can build a better mousetrap when it comes to licensing the ARM ISA (as have Qualcomm, NVIDIA, and others).
The Desktop
2015 shows no difference in the performance desktop space, as it is still serviced by the now venerable Piledriver based FX parts on AM3+. The only change we expect to see here is that there will be a handful of new motherboard offerings from the usual suspects that will include the new USB 3.1 functionality derived from a 3rd party controller.
Mainstream and Performance will utilize the upcoming Godavari APUs. These are power and speed optimized APUs that are still based on the current Kaveri design. These look to be a simple refresh/rebadge with a slight performance tweak. Not exciting, but needs to happen for OEMs.
Low power will continue to be addressed by Beema based APUs. These are regular Puma based cores (not Puma+). AMD likely does not have the numbers to justify a new product in this rather small market.
2016 is when things get interesting again. We see the release of the FM3 socket (final proof that AM3+ is dead) that will house the latest Zen based APUs. At the top end we see "Summit Ridge" which will be composed of 8 Zen cores (or 2 Zen modules). This will have 4 MB of L2 cache and 16 MB of L3 cache if our other leaks are correct. These will be manufactured on 14nm FinFET LPE (the more appropriate process product for larger, more performance oriented parts). These will not be SOCs. We can expect these to be the basis of new Opterons as well, but there is obviously no confirmation of that on these particular slides. This will be the first new product in some years from AMD that has the chance to compete with higher end desktop SKUs from Intel.
From there we have the lower power Bristol Ridge and Basilisk APUs that we already covered in the mobile discussion. These look to be significant upgrades from the current Kaveri (and upcoming Godavari) APUs. New graphics cores, new CPU cores, and new SOC implementations where necessary.
AMD will really be shaking up the game in 2016. At the very least they will have proven that they can still change up their game and release higher end (and hopefully competitive) products. AMD has enough revenue and cash on hand to survive through 2016 and 2017 at the rate they are going now. We can only hope that this widescale change will allow AMD to make some significant inroads with OEMs on all levels. Otherwise Intel is free to do what they want and what price they want across multiple markets.
Nice.
Amur on 20 nm was a
Nice.
Amur on 20 nm was a surprise. Where do they fit in? tablets maybe?
Thanks for this excellent
Thanks for this excellent article Josh.
“Puma+ is a very solid
“Puma+ is a very solid processor architecture for low power products, but these parts are still limited to the older 28 nm HKMG process from TSMC.”
Beema already comes from GF.
“These will be manufactured
“These will be manufactured on 14nm FinFET LPE (the more appropriate process product for larger, more performance oriented parts).”
Why? Isn’t it just “Low Power Early”? LPP should be better for everything.
You are probably correct. I
You are probably correct. I only thought of LPE if they were attempting to get a good time to market. LPP is the more efficient and performance oriented process node for these products.
Only two Zen or K12 cores in
Only two Zen or K12 cores in some models make me wonder if the Zen and K12 cores are really so high performance cores that can fight competing products of four to eight and soon ten cores.
Also remember that the rumors
Also remember that the rumors suggest that Zen cores are using some kind of implementation of Simultaneous Multi-Threading (SMT) so these CPUs would be able to address 4 threads at a time.
AHA! Thanks for the info.
AHA! Thanks for the info.
FM3!? What!? F THAT! How long
FM3!? What!? F THAT! How long are you going to stretch this out, AMD?!
Hey Josh.
Why do you say zen
Hey Josh.
Why do you say zen module? I thought the module design was gone.
Zen cores come in groups of 4
Zen cores come in groups of 4 with 8 MB of shared L3. These groups of 4 then use an interconnect to communicate with the next group of 4. Easier to classify them as modules.
I see, is it similar to
I see, is it similar to intels design?
Not that I know of. It is
Not that I know of. It is similar in that it uses SMT so a 4 core part can supposedly handle 8 threads… but that's about it.
SMT is not intel’s design, it
SMT is not intel’s design, it is much older implementation of multithreading, intel’s just used it. CMT on the other hand was AMD’s in-house multithreading implementation and bit ahead of its time as it sacrificed IPC in favour of number of cores.
SMT is the generic computing
SMT is the generic computing sciences term/acronym for what Intel’s marketing calls Hyper-Threading, same for AMD’s marketing term/acronym hUMA for the generic term/acronym UMA(Unified Memory Addressing). Companies like to create these artificial marketing terms do differentiate their products from the competition. When researching the technical descriptions of these various technologies its always good to use the standard/generic computing science definition of the technology to be able to get the more academic search results, and avoid all the marketing search results from using the individual company’s trade terms(Hyper-Threading, hUMA, others).
Do not let any companies’ marketing departments fool you into thinking that their company is the one that invented sliced bread. Much of the computing technology of today was invented decades ago, or first implemented on mainframes and afterwards implemented on microprocessors. This includes fab process node technology, and where that is concerned it is the company that spends the most money on the technology, and not the company that invented the technology, that leads. There are about 6 or so companies that supply the world with the fab making equipment/technology, so the big bucks can buy/license the most advancements/latest equipment and implement the newest fab technologies the fastest.
SMT is not intel’s design, it
SMT is not intel’s design, it is much older implementation of multithreading, intel’s just used it. CMT on the other hand was AMD’s in-house multithreading implementation and bit ahead of its time as it sacrificed IPC in favour of number of cores.
Also notice in the leaked
Also notice in the leaked slide how “Summit Ridge” is specifically called out as NOT being an APU. It says CPU. So the high-performance chips will not have on-die graphics.
I found that interesting as
I found that interesting as well. I would almost hope for a GPU portion to be on such a chip… otherwise, why push HSA if you don't have something in the performance mainstream that supports it?
Maybe not for the consumer
Maybe not for the consumer gaming market, most high end gamers only want the CPU, and they’ll get their graphics on the discrete GPU card/s. But that still does not mean that AMD could not produce a High performance APU for the HPC/workstation market with 16 full fat Zen cores with 4(4 Zen core) Zen units, and an HBM equipped Greenland GPU, for numerical workloads, or even a Firepro APU for the professional graphics market. AMD did have a Firepro branded APU, but that was for the developing world market that could not afford the costly Server workstation CPUs, and full discrete Firepro SKUs.
So do not confuse this mainstream/consumer market roadmap, with what may be happening for AMD’s professional/HPC/server market roadmaps that my have extra SKUs for specialized professional/HPC/server workloads.
AMD may have some custom server SKUs that are commissioned by a client for server/HPC use, and just like AMD’s custom console APUs, AMD has a custom systems division that will provide made to order SKUs based on AMD’s IP, if that’s what the client wants and pays for. Too many folks think that AMD’s is just about the consumer/gaming market.
OMG Joel Hruska(1) over at
OMG Joel Hruska(1) over at extreme tech dissing AMD’s K12 as being not wanted because of windows RT is gone, who the hell wants to run a K12 based tablet on windows RT, or 10, K12 is going to be for the android OEM market and K12 is going to compete with Apple’s A series products. The android based tablet OEMs will be all over K12, and if K12 has SMT capabilities built into its core microarchitecture, even Apple will have trouble competing in the IPC, and the Performance per core metrics. WHY does windows come into play for the tablet market, when it’s the iOS, and android tablet devices that are leading in the mobile market.
In the High end tablet market there will be a dual core Zen APUs to handle the legacy Windows market, and Joel Hruska Knows this, so why the attempted positive spin for M$’s windows, at the expense of reality, well we all know that RT is dead. Basillisk(Zen dual core) will be for windows tablets, K12 will be Apple’s major competition in the custom ARMv8a ISA based tablet market. Android based tablet OEMs will be all over K12.
What about the Android based tablet makers Mr. Hruska, they for sure need an ARM based APU that can run Android, and compete in performance metrics with Apple’s A series SOCs. Is android a dead end market, not really, I would like to see some full Linux based K12 tablets personally, but that assumption about K12 is wrong, espically for the OEMs that do not have Apple’s engineering resources, for them AMD’s K12 will give many third party tablet OEMs a powerful APU/SOC to compete with Apples a series SOCs, that only Apple has access to. There WILL be great demand for a Custom ARM k12 APU that can compete with Apple’s A series custom Cyclone/Cyclone updated cores, the OEMs will be able to compete with Apple’s cores by using AMDs K12 in their tablets(Android, or full Linux) based.
(1)
“AMD’s desktop, mobile roadmaps for 2016 may have just leaked”
By Joel Hruska on April 29, 2015 at 2:27 pm
Hruska: “Second, there are definite oddities on the mobile side. With Windows RT dead, there’s simply no demand for ARM devices running Windows. AMD’s K12 core may have a market in the server business or in mobile products, but if they intend to ship it in an ARM device they’re going to run face-first into a dead-end market.”
Who knows, there might not
Who knows, there might not even be a tablet market by the time K12 ships.
Honestly, though, the number of tablets sold on superior performance is laughably small, and while I’m genuinely excited for a full-fat ARM core outside of Apple’s efforts, Android tablets are very much build around ‘good-enough’ (i.e. cheap) SOCs.
I am curious how much
I am curious how much implementation is shared between x86 and ARM products. Modern processors decode instructions into an internal representation and the rest of the hardware deals with this internal representation. Is it possible for them to use the same hardware units for both ISAs, except for the decoders and maybe the memory management units?
Jim Keller mentioned, in one
Jim Keller mentioned, in one of the Youtube videos, that there was a lot of cross pollination and sharing of Ideas between the K12 and Zen design teams. So I hoping that that leads to a K12 core microarchitecture with SMT capabilities. A K12 APU with SMT, and a extra wide order 6+ IPC extra wide order superscalar design, similar to its Zen x86 counterpart in design philosophy, would be the Custom ARMv8a ISA based part to beat Apple’s A series custom ARMv8a extra wide order superscalar parts. If AMD can get the execution pipeline count equal to, or a little better than Apple’s Cyclone cores, and add SMT to its K12, then the K12 would be able to beat Apple in core execution resources utilization. Apple’s non SMT enabled Cyclone cores can not keep its execution pipelines at an efficient level of utilization if the single processor thread stalls, but if AMD adds SMT to K12, and plenty of execution piplines, if one of the core’s processor thread stalls, the other thread will still be able to utilize the execution pipelines, why the other thread is stalled, this is how SMT is able to keep the processors execution piplines at as close to 100% utilization as possible under heavy workloads. So yes design ideas are transferrable among different CPU/SOC designs even though the instruction set differ. Most modern microprocessors are very similar in their overall system design, there are exceptions, but in general x86, and ARM, MIPS, and others are fairly similar and are based on the Modified Harvard Architecture.
Looking at the chip specs
Looking at the chip specs alone, I am quite worried about the single core and single thread performance. I mean, the guys want to pack 8 cores and 16 threads into single 95W-or-so CPU. That is a lot.
And with that, I would expect lower single-thread performance than Intel, unless AMD goes ballistic and does Core-M-style overclocking on the desktop.
Sure, single-thread CPU performance is often overrated (Atom is good enough for most tasks), but ultimately many people buy on benchmark results.
On the flip side, I can see it working well:
– in notebooks, where will be better match between number of cores/threads and performance (15W Basilisk, 2-core/4-thread may be spot on ultrabook nirvana); and
– in servers, with multithreaded workloads
With 8 cores it will probably
With 8 cores it will probably have a pretty big gap between base and boost clock speeds. Looking at the current E5 Xeons for comparison, the 8-core 2640, a 90w 22nm part, runs at 2.6GHz base and 3.4GHz boost.
If it is as massive of a
If it is as massive of a redesign as these rumors indicate, then I would expect significantly better IPC. They need to go with a higher IPC design to get power consumption under control. Implementing SMT requires a wider core also. Why wouldn’t you expect significantly higher IPC from a completely new design?
A large number of cores is not surprising given it is a 14 nm design. For intel’s 14 nm process with broadwell, a cpu core with L2 and 2 MB L3 slice is only around 12 square mm. We don’t know how large AMD’s design will come out, but at 14 nm, 4 and 8 core parts should still be small, at least for the cpu part. It may be ~60 square mm or so for 4 cpu cores, memory controller, and system interfaces minus any gpu though. Broadwell is 82 square mm with 2 cores but around half the die is gpu and quite a lot of die area is taken up by non-core components.
Anyway, I am mostly interested just from an engineering perspective. I see the applications that are currently used to test CPUs, and I don’t really care about any of them. They are mostly irrelevant; you can web browse and such with just about any modern system. Video encode/decode is much better done using gpu plus dedicated hardware. I don’t care too much how long it takes for compressed archives to extract. When we get DX12, it will reduce the cpu overhead, so until developers come up with a way to use a lot more cpu power, the bottleneck will just be shifted even more strongly onto the gpu.
Foe people on a budget, current AMD parts are good. Going Intel, you have to spend around $200 for a quad core part. AMD will give you 4 integer cores (2 module) for a lot cheaper. You can get an 8-core (4 module) FX-8320E from newegg for $136 right now; almost the same price as a Core i3-4330 dual core. The FX processors may not perform quite as well as the Intel competition with single threaded DX11, but with multithreaded DX12, it will probably not be the bottleneck. In fact, the dual core Intel part should be at a big disadvantage with a well threaded DX12 game.
Why wouldn’t you expect
Why wouldn’t you expect significantly higher IPC from a completely new design?
So you won’t be disappointed again when it ends up being low?
I am not holding my breath
I am not holding my breath when it comes to AMD. I think they passed their glory days long ago. However, are they going to finally just have one socket option now, in the form of FM3 then? If so, that would be really nice. And it is about time they started the death of the AM3+ socket. What about single thread performance on the “Summit Ridge” CPU? They have yet to compete with Intel there. I sincerely hope they can give Intel a run for the money, but I think that Intel has a huge lead that AMD simply doesn’t have the resources to overcome unfortunately.
Well, Phenom II had better
Well, Phenom II had better single thread performance per clock over Bulldozer and derivatives by a pretty hefty margin. It is not out of the question that AMD can improve upon that key element with a new, from scratch design that can encompass the best elements of previous designs, as well as new elements that they undoubtedly have developed. I think it will be a much better year for them, but their hurdles with a new design will be getting people and OEMs to adopt it.
I think we need to see what
I think we need to see what 2016 brings, however i feel we should not hope for an enthusiast part to challenge Intel- AMD positioned itself away from that a few years ago.
the parts will probably will be much better than the current ones in the APU space- but that will not equate into an Enthusiast desktop but a entry – mainstream consumer.
I don’t know. That
I don't know. That performance 8 core, 16 thread performance CPU doesn't sound bad off the bat. If they have improved IPC to older Core i5/i7 levels, it could be a new ballgame for AMD.
wawza, 14nm HBM 4096bit APUs
wawza, 14nm HBM 4096bit APUs will be sumkinda beast. hoping for hexacore HBM APU =D
Its good if they can cash in
Its good if they can cash in on their ZEN core expectations, surtenly now that “Globalfoundries 14nm process has volume production levels” http://www.fudzilla.com/news/processors/37647-globalfoundries-14nm-process-has-volume-production-levels . It looks like 2016/2017 are going to be some exiting years for them.
With the 20 nm and 14 nm
With the 20 nm and 14 nm nodes, I am strictly in believe it when I can buy it mode. We have been waiting a long time for new process nodes and it still seems like yields at 14 nm may not be ready yet.
The mainstream desktop chips
The mainstream desktop chips are SoCs and they share the same socket with the high performance GPU-less FX chips. Does this spell the end of southbridge/chipset? If so how does it affect motherboard manufacturers?
They may be designed as SOCs,
They may be designed as SOCs, but that is not to say that AMD will not utilize an external southbridge for desktop motherboards (more SATA ports, USB 3.1, etc.). Designing something from the outset as an SOC is smart if you are going to implement it across multiple markets, but it makes sense to be able to also utilize it with an external southbridge to get more ports, more I/O, and more functionality where the power envelopes make sense.
I will be honest and say that
I will be honest and say that I”m happy AM3+ is going EOL, and new FM3 will replace it, as long as the chipset is first of all one chip, and has all the new toys in it (USB3.1/PCIe3/m.2) and such.
STill from what was mentioned, having SMT like support would give double the threading per core. So 4 core (1 module) would be 8 threads, and the nice 8 core would give us 16 thread.
On the Opteron side, 32 core giving a dreamy 64 thread. Hmm.. now time to wait for execution of the idea. Bulldozer on paper also looked promissing and we.. we know where that ended.
Still I guess we should see something around summer time? Can’t wait. And I guess for now my 8350 will just have to do.
Lovely to see AMD fans coming
Lovely to see AMD fans coming around to the idea of SMT. Just don’t get your hopes up, guys, it’s still not a magical panacea. (not generally, anyways)