Broadwell Microarchitecture

After our discussion about the 14nm future for Intel we got a chance to talk with Stephan Jourdan, Fellow and Director of System-on-Chip Architecture at Intel, about the microarchitecture changes of Broadwell, specifically those affecting the Y version of the processor. The biggest challenge but primary to the architecture team was the “journey to fanless”, to build an Intel Core processor that could fit in an 8-10mm form factor with a 10-in or larger screen while maintaining performance levels at a 3-5 watt total power. That power target could change on the display size, the chassis Z-height, material and target skin temperatures and ambient (room) temperatures, but the power targets all fall within a fairly tight window. (Side note: Intel’s targets are 25C ambient and 41C skin temperatures with a metal chassis design.)

Intel could easily build a processor in the 3-5 watt window but the charge was to do it while maintaining the performance levels of current processor families, or increase it. Obviously you need to offer performance per watt efficiency, but for those applications and cases where you need, peak burstable application performance is required. While a traditional, non-optimized design would see peak performance in short “bursty” workloads, as soon as performance is needed over a sustained period that clock speed would drop. Once you hit heavier workloads, performance decreases even further as you need to cool the processor die and chassis. With Broadwell-Y, Intel thinks they have nearly perfected all three scenarios.

But you don’t get that by just wishing for it. Intel pins the success of Broadwell-Y on five key components: 14nm process, packaging innovations, updated FIVR and 3DL power input, better power management and power reductions. We’ve already discussed the move to 14nm and how it improves the silicon manufacturing process, but Jourdan put more specific numbers on those benefits. For example the lower minimum operating voltage allows for a 20% decrease in required power at some performance levels and the better-than-normal capacitance scaling of 0.65x from the 22nm node results in a 25% power requirement decrease thanks to those transistor and interconnect scaling optimizations.

The packaging of the new Broadwell-Y part allows for a 25% total area reduction on the motherboard PC when compared to Haswell-Y which means smaller system designs. In the image above you can clearly see the difference between the two packages with the 14nm Broadwell die (and 32nm PCH) on the right and the Haswell counterpart on the left. The area of the package on the motherboard has shrunk by a full 50% but maybe more importantly is the 30% smaller Z-height – how far the silicon actually rests above the motherboard surface.

Intel confirmed the die size of Broadwell-Y at 82 mm^2, 58% smaller than Haswell at 22nm for a dual-core processor implementation with the same 2MB of LLC (last level cache) per core.

When Intel introduced Haswell it included the company’s first FIVR implementation, fully integrated voltage regulator, taking much of the power control from the motherboard and moving it onto the silicon directly. With Broadwell comes the second generation of the FIVR implementation with improved efficiency at lower voltages. The 3DL modules are perhaps the most interesting change here; they are inductors that would normally be on the processor packaging substrate but have been moved UNDER the die for spacing concerns. The 3DL are used at extremely low voltages, an area where apparently Intel had concerns with even the next-generation of FIVR implementation. With the 3DL’s improved efficiency at these low voltages it acts as a pass through for the power delivery when frequencies and power levels drop. It wouldn’t be a surprise now to find that Intel would drop the FIVR implementation all together in architectures moving past Broadwell, with the move to lower power continuing to gain prominence.

Interestingly, in order to maintain the Z-height requirements for Broadwell-Y the 3DL was placed under the die on the packaging. This will require a hole to be taken out of the motherboard behind the processor for the 3DL to rest in.

Intel has improved power management with two new technologies that improve the scalability of Broadwell-Y on both the high end and low end of performance levels. For high performance situations, Enhanced Turbo Boost is a PL3 state that allows further scaling that we have seen before. PL1 is the rated clock speeds for long term system usage while PL2 is the typical “burst” level of the processor. A CPU could be in PL1 for minutes at time at, for example a 5 watt power draw, but only maintain PL2 levels of performance a handful of seconds because of power draw in the 10-15 watt levels. PL3, though obviously at higher clock speeds, will draw more than 20 watts of power, but only for milliseconds of time. Intel says they have seen instances where batteries can be damaged with power draw levels this high for longer periods of time so they are limiting its implementation for more reasons than just producing shorter battery life.

To help facilitate better battery performance Intel engineers built in a feature called Duty Cycle Control that attempts to produce lower effective frequencies than the processor could normally run at. It might seem counter intuitive at first, but due to voltage minimums set by processor architectures and process technology to maintain stability, processors like Broadwell-Y can’t run at lower that a specific voltage and remain 100% stable. Because of that and certain efficiencies associated with clock speed / voltage curves, processors could have a minimum clock speed of 500 MHz, for example.

Duty Cycling Control quickly switches the processor on and off, on and off, running the processor at 500 MHz and then turning it off (thus drawing zero power and not interfering with efficiency). By turning the CPU core off for 80% of the time, Intel is able to create an “effective” clock speed of 400 MHz for the CPU – the processor would be performing at a level (or similar) as if it were running at 400 MHz. At the same time the processor is able to enter power saving mode for 20% of that time, saving valuable battery life and creating an “effective” running voltage. All of this depends on Broadwell-Y’s ability to quickly switch cores on and off without latency concerns or context switching issues, but that is something architectures have been doing for quite some time.

This Duty Cycling is not only used on the x86 processor cores but also can be utilized by the Intel GT graphics system as well, enabling “effective” power and performance levels with one to four of the GPU slices.

Intel’s path to a fanless Core M processor design is clearly dependent on intelligent engineers and management all going in the right direction, but at the end of the day math is the determining factor. The above improvements to active power, from the 14nm process tech to the lower operating frequency options, coupled with the leakage reduction provided by lower max voltages and cooperation between silicon and architecture teams, result in what looks to be an impressive product launch.

But there is more to Broadwell-Y than just the power efficiency improvements; the core, graphics and chipset have all seen changes as well.

Though it is not by a large amount, Broadwell looks to improve IPC (instructions per clock) over the previous generation Haswell by something around 5%. There have not been major instruction updates from Haswell though, so most of what you see above is fairly minor on an individual basis. Updated out-of-order schedulers, faster floating point multipliers and improved address prediction don’t do much individually, but when you get engineers at Intel enough time to pool these updates together they add up to something measurable. The core features were targeting a 2:1 ratio of performance to power, which would require new features and updates to improve performance by at least 2x while maintaining power levels. Previous architectures, including Haswell, were targeting a 1:1 ratio in that metric and were less concerned with improving efficiency than maintaining efficiency.

Graphics performance and efficiency is improved as well and sees a much more dramatic jump in performance than the x86 portion of the die, which is more or less what we have been expecting for some time. The graphics architecture of Broadwell-Y will ship with up to 24 EUs (execution units) which is 20% higher than the 20 EUs in Haswell. The design continues to be very scalable and higher power mobile and desktop parts will definitely see higher EU counts. The EUs are broken up into three “slices” this time though (two slices on Haswell), each with 8 EUs, thus we will see 50% more sampler throughput on the graphics system.

Moving to a 14nm process technology should give the GPU more thermal headroom and result in higher maximum clock speeds. Of course, performance and scaling will depend on system level TDP availability and battery consumption requirements from OEMs, but having access to more GPU power is good. Intel points out a continued focus on gaming with support for DX11.2 (feature level) and OpenGL 4.3. OpenCL is supported with shared virtual memory support for GPU compute applications.

Media improvements on Broadwell include 2x the Video Quality Engine throughput, more performance for QuickSync and power reductions during basic video playback. H.265 decoding is supported on this design but not with full hardware support.

Broadwell-Y supports 4K displays but according to what we learned last week, only at 30 Hz refresh rates, which is kind of disappointing. Even for non-gaming applications, the differences between 30 Hz and 60 Hz refresh rate usage is dramatic. (Interestingly, I learned that Haswell-Y and Haswell-U don’t support 4K resolutions in any form.)

Finally, we get to the lowly PCH, the chipset remnants that do all the dirty work. Even though it is still built on 32nm process technology, Broadwell-Y still sees some performance gains and power drops to help facilitate the platform goals. The audio DSP is upgraded with increased SRAM while PCIe storage gets a bit more emphasis. Idle power on the chipset was reduced by 25% over the Haswell generation thanks to better gating and basic architectural work.

Closing Thoughts

The information provided by Intel about Broadwell-Y today shows me the company is clearly innovating and iterating on its plans set in place years ago with the focus on power efficiency. Broadwell and the 14nm process technology will likely be another substantial leap between Intel and AMD in the x86 tablet space and should make an impact on other tablet markets (like Android) as long as pricing can remain competitive. That 14nm process gives Intel an advantage that no one else in the industry can claim and unless Intel begins fabricating processors for the competition (not completely out of the question), that will remain a house advantage.

Just looking at the image above of a reference board using Broadwell-Y, and seeing the size of the processor and chipset package compared to memory chips and other controllers, showcases the ability of Intel to produce truly impressive products. Broadwell will eventually stretch from the server room to the tablet with a wide array of products, but what we have seen today already has me excited for the first 14nm products that should be available by the end of 2014.

« PreviousNext »