Digging into a specific market
This is the last Bulldozer-based design. How will this work going forward?
A little while ago, I decided to think about processor design as a game. You are given a budget of complexity, which is determined by your process node, power, heat, die size, and so forth, and the objective is to lay out features in the way that suits your goal and workload best. While not the topic of today's post, GPUs are a great example of what I mean. They make the assumption that in a batch of work, nearby tasks are very similar, such as the math behind two neighboring pixels on the screen. This assumption allows GPU manufacturers to save complexity by chaining dozens of cores together into not-quite-independent work groups. The circuit fits the work better, and thus it lets more get done in the same complexity budget.
Carrizo is aiming at a 63 million unit per year market segment.
This article is about Carrizo, though. This is AMD's sixth-generation APU, starting with Llano's release in June 2011. For this launch, Carrizo is targeting the 15W and 35W power envelopes for $400-$700 USD notebook devices. AMD needed to increase efficiency on the same, 28nm process that we have seen in their product stack since Kabini and Temash were released in May of 2013. They tasked their engineers to optimize their APU's design for these constraints, which led to dense architectures and clever features on the same budget of complexity, rather than smaller transistors or a bigger die.
15W was their primary target, and they claim to have exceeded their own expectations.
Backing up for a second. Beep. Beep. Beep. Beep.
When I met with AMD last month, I brought up the Bulldozer architecture with many individuals. I suspected that it was a quite clever design that didn't reach its potential because of external factors. As I started this editorial, processor design is a game and, if you can save complexity by knowing your workload, you can do more with less.
Bulldozer looked like it wanted to take a shortcut by cutting elements that its designers believed would be redundant going forward. First and foremost, two cores share a single floating point (decimal) unit. While you need some floating point capacity, upcoming workloads could use the GPU for a massive increase in performance, which is right there on the same die. As such, the complexity that is dedicated to every second FPU can be cut and used for something else. You can see this trend throughout various elements of the architecture.
And of course, there was a few instances that they were a bit too aggressive. Josh Walrath, who normally covers AMD products and CPU architectures for us, mentioned that early Bulldozer parts were not able to keep the dual integer units fed with Fetch and Decode in all situations. AMD did not mention anything in particular, but the phrase “Sometimes you get a product back and say 'Yeah, it actually would have been nice to have two of these'” came up in response to my questions. It happens, but it gets smoothed out over time, and we're talking about its fourth generation with Carrizo's “Excavator” cores. We're talking history at this point, but I feel it leads to an important mindset.
The main design choices seem to have pointed toward a universe where developers embrace GPU compute and optimize toward it. This aligns with their core architecture, their spearheading of the HSA initiative, their interest as a company in GPU development, and so forth.
It made sense, too. Tim Sweeney, head of Epic Games, expected (in 2008) that the generation after Unreal Engine 3 would be written in “a real programming language” that could be executed on the GPU, rather than DirectX and OpenGL. A year later, which would be 2009, he noted that developing a GPGPU application (at the time) required about ten-fold more effort, and that is not worth the added control. Software developers were eying in that direction, and AMD was already working on tools and languages. Now, of course, Mantle led to DirectX 12 and Vulkan, so the trajectory has changed again, and graphics APIs will probably be at the center of it for the foreseeable future.
Will you dig it?
The point I want to highlight is that hardware architects can do a lot with optimizing for their workloads. This is at the center of Carrizo. AMD has picked several use cases, lumped them together into a single product, and told their engineers to focus their design on it. Carrizo is aimed at the $400 to $700 (USD) laptop segment, which is where the bulk of sales occur. AMD tried for the biggest gains at the 15W segment with Excavator, but they also kept 35W in mind.
This leads to specific uses. Gaming is first and foremost for AMD, which I will get to in a minute, but I will lead with video decoding. H.265, also known as HEVC, looks like it will be the major new format for video. Carrizo includes a dedicated HEVC decoder, which also supports H.264, MPEG2, DivX, and other formats of course, but HEVC is the new addition. This should provide a sharp reduction in power consumption as well as smooth playback for the videos it targets (hence why it targets them).
They also shaved power consumption by scaling and post processing the video as it is delivered to the display. This keeps the GPU powered down and, more interestingly, cuts the system memory access as the frames would otherwise need to enter and leave the GPU. AMD claims that this accounts for a half of a watt in savings during video playback, which is a lot considering that Kaveri used just under 5W total and Carrizo is listed at just under 2W (for 1080p content). Carrizo is advertised as supporting up to 4K video.
This feature is not just for video, either. It can apparently be used by any application, including rendered content such as video games. This just came up as a brief comment mid-keynote, so I cannot elaborate on this.
Gaming is front and center in the Carrizo launch, too, but that is expected. AMD is one of the top two PC graphics companies. This part is designed to target mainstream games, such as DOTA 2, League of Legends, and Counter-Strike: Global Offensive maxed out at 1080p with at least 30 FPS. Remember, of course, that this is at a 15W design point for $400 to $700 laptops. It will also support AMD Dual Graphics (formerly Hybrid Crossfire) for some of AMD's discrete, mobile GPUs to give it a boost into a comfortable user experience.
The GPU performance is listed at 819 GFLOPs, which puts it in the ballpark of desktop Kaveri (856 GFLOPs) and is based on the same, third-generation GCN cores that you see integrated in Tonga. Specifically, it includes eight of them, which add up to 512 shader units (versus 1792 of the Tonga-based, R9 285 discrete GPU). This enables FreeSync for notebooks as well as TrueAudio and, of course, the next generation graphics APIs: Mantle, DirectX 12, and Vulkan. It will also support "OpenCL 2.x". I am uncertain what this means for OpenCL 2.1 specifically, when and if, but developers have been waiting for a platform to begin programming on so it is worth keeping an eye out for drivers.
As we have talked about over the last few months, Carrizo is also the first HSA 1.0-supporting processor. This is useful for applications that switch heavily between serial and parallel workloads. AI visibility and path-finding are two such tasks, but those will likely not be useful for Carrizo's application in video games, because the integrated GPU is probably not just sitting idle while a discrete GPU does the heavy lifting. It will lower computation time for applications that use the GPU for general compute though, because the data will not need to be moved or copied between compute devices and contexts can schedule work more efficiently. This is utilized in recent versions of Java and Python, which could be useful for developers and users of enterprise software.
Looking forward to a moment of Zen
When you are stuck on a process node, you need to find other ways to increase performance. It would be silly to start with a clean slate for each product and come up with the most efficient circuit possible for each target workload at the time. There is always something you can change or clean up, and those new or modified elements can be carried over into the future. Some can even be carried over to AMD's endeavors with ARM processors. Designers at AMD would say that their job is about finding the correct problems for the engineers to revisit, and the second attempt will always be better than the first. The correct problem is the one that will yield the biggest increases in performance, or decreases in power consumption, for the effort they can afford to give it.
Carrizo is a design aimed at a specific type of user. It is the last of the Bulldozer architecture, leaving room for the Zen architecture to back away from the shared FPU model. The work done on Carrizo will still carry over though, especially in how it integrates a bunch of parts on chip and does so with a significant reduction in die real estate, considering the complexity. Next time, they can optimize elsewhere and do so with a bigger budget, due to smaller fabrication processes and different power requirements.
For now, it will try to make a dent in the mid-range, power-efficient products, focusing on what people actually use those devices for. As always, we will need to wait and see our own benchmarks to quantify this for ourselves.
a moment of zen will never be
a moment of zen will never be the same after this august…sigh.
Finally….a HEVC hardware
Finally….a HEVC hardware decode…..question remains…..does it support HEVC 10bit as well?
I more worry about upcoming stack of 300 series.
It is somehow confirmed 390x and below will be rebranded GPU which dont have any functional HEVC decoding. (not even hybrid decoding such as Intel and Nvidia solution)
http://www.strongene.com/en/d
http://www.strongene.com/en/downloads/downloadCenter.jsp
Strongene made HEVC OpenCL decoder in collaboration with AMD.
It is not full UVD support Carrizo has but it’s working.
How do you use this? How do
How do you use this? How do you install it? Which program can use it?
a simple install and then it
a simple install and then it works in windows media player.
Read the instruction
Read the instruction
Doesnt work on 10bit HEVC and
Doesnt work on 10bit HEVC and it will never be supported. Strongene already clarified this.
thats nice…
nobody wants to
thats nice…
nobody wants to wait half a year for them to come out.
Amd your getting owned in the gaming laptop market
make some cool gaming laptops and 2-1’s
just do it
Well the HSA 1.0 compliance
Well the HSA 1.0 compliance means that the GPU can now directly address the same RAM and virtual memory address space as the CPU, so talk about not running out of sufficient texture space for large non gaming graphics projects, and texture compression will allow for better usage of any dedicated APU cache for low latency graphics frame buffering for gaming workloads. In fact allowing the GPU to context switch and have the same memory space addressing as the CPU cores will move a lot of the memory handling/management functionality over to the OS and graphics APIs from many of the graphics applications that have to do their own separate GPU memory handling/buffering management with the pre HSA 1.0 compliant APUs/SOCs. Before the GPU only had a limited amount of available main system RAM to augment the GPUs dedicated amounts, if the GPU had any to begin with on integrated GPUs. So now with Carrizo’s HSA 1.0 abilities there plenty of RAM(Physical and virtual) that its GPU can address and directly control via a simple request to the OS/graphics API for more memory to be allocated. No more application crashes for lack of available memory from an artificially limited few GBs at most from available dedicated system RAM for graphics memory that systems based on the older APUs/SOCs would offer for graphics applications to work with.
You are missing a slide, specifically the Media and entertainment workflows slide. It mentioned the memory copying power/processing savings from not having to move massive amounts of data between non unified CPU and GPU address space, as only a simple pointer pass is necessary, this is a big time improvement for working on large multi-million polygon count models/scenes between CPU and GPU for graphics applications workloads. there are about 26+ total slides in the presentation, and I hope that PCPer is going to be doing a more complete review and some benchmarks when some review samples can be obtained. How far off before any independent benchmarks can be done?
a Carrizo based laptop even without any heavy quad core and processor threading looks like a good low cost solution for a mesh editing laptop, and maybe some light rendering workloads, although a quad core CPU laptop with SMT is better for ray tracing workloads, or best with a workstation and as many Xeon cores as can be afforded is better for ray tracing rendering. Zen based APUs are going to be very popular if they can get the full core/processor thread count up there, and the single threaded IPC to at least IvyBridge levels of performance, and that 16 core HPC/workstation Zen APU, with pro Greenland graphics, and HBM if priced affordable may just be the SKU that makes professional graphics more affordable for many without the deep pockets for Intel’s pricy Xeon kit. 16 full fat Zen cores, with 32 processor threads could make short work of some heavy ray tracing workloads, and HSA will definitely let some of that ray tracing workload be transferred to the GPU as well, that previously could only be done on the CPU.
I think it would be
I think it would be financially safer for AMD not to release any new APUs till they have something based on the 16nm FinFET process. I hope they survive.
Why when they’re number 1
Why when they’re number 1 ?
Their less than 12 months away from 14nm, stop your grinning and drop your linen !
25W 12CC AM1 part please.
25W 12CC AM1 part please. Since this is a SoC design, it should not be that hard AMD.
carrizo and carrizo-L laptops
carrizo and carrizo-L laptops already available in NZ?
https://www.noelleeming.co.nz/shop/computers/laptops/pc-notebook-computers/hp-15-ab025ax-15-6-white-notebook/prod138855.html
This is the problema with amd
This is the problema with amd notebooks. give me a full hd(1080p) monitor and not this crap 1366×768, so we’ll associate the apu with good products. unexcusable.
Do the same for Intel and
Do the same for Intel and your in deeper sh%t !
AMD? Zen?
“A chart in hand is
AMD? Zen?
“A chart in hand is better than the product in the market.”