Nehalem Details
Intel IDF Preview: Tukwilla, Dunnington, Nehalem and Larrabee - Processors 27

Following the “tick-tock” development cycle that has gotten Intel back on track since the death of the NetBurst architecture, the upcoming Nehalem architecture will take the 45nm process technology (not the chip architecture) that was developed with Penryn and built a completely new design.

Intel IDF Preview: Tukwilla, Dunnington, Nehalem and Larrabee - Processors 28

Honestly, not much more than had already been discussed about Nehalem was showing at the briefing but it is always good to see the details extrapolated once more.  Nehalem CPUs will span from dual-core configurations up to 8-cores per processor with a quad-core design hitting the market first. 

The micro-architecture is similar to that of the current Core Architecture though it has been modified to support four simultaneous instructions and two-way simultaneous multi-threading – otherwise known as HyperThreading.  The list of other new features is impressive though including Intel’s first integrated memory controller, a new HyperTransport-like QuickPath interconnect, a new L3 cache and dynamic power management for controlling all the cores and feature independently.

Intel IDF Preview: Tukwilla, Dunnington, Nehalem and Larrabee - Processors 29
Nehalem up close

Intel’s move to an integrated memory controller and QuickPath is definitely an (unintentional) nod towards AMD; Intel will still say they were waiting for the right time to introduce the technology into the marketplace but at this point it’s nearly impossible not to see that AMD was really ahead of the game with K8.  Unfortunately, AMD’s current execution of their architectural leadership has faltered. 

Intel IDF Preview: Tukwilla, Dunnington, Nehalem and Larrabee - Processors 30

Nehalem will also be very modular in its design, allowing Intel piece together some or all of its features into different chips allowing them to address varying markets and price points.  Just as a couple of examples in this slide, Intel shows a quad-core processor with an L3 cache, integrated DDR3 memory controller and single QPI connection.  Another option shows 8-cores and multiple QPI connections and would probably address a high end server market. 

Intel IDF Preview: Tukwilla, Dunnington, Nehalem and Larrabee - Processors 31

As I said above, the architecture is based around Intel’s current Core Architecture in the Core 2 Duo, etc CPUs.  They have increased in the operations per clock from 3-way to 4-way and have adjusted the way the cache system can be accessed by the cores.  Intel also discussed a bit about their enhanced branch prediction unit; these are all improvements we expect to see in a generation-to-generation architecture change.

Intel IDF Preview: Tukwilla, Dunnington, Nehalem and Larrabee - Processors 32

Also as noted, a return to SMT is going to follow Nehalem to the market with each core able to work on two software threads simultaneously.  The SMT in Nehalem should be more efficient that the HyperThreading we saw in NetBurst thanks to the larger caches and lower latency memory system of the new architecture.

Intel IDF Preview: Tukwilla, Dunnington, Nehalem and Larrabee - Processors 33

Intel is also bringing a three level architecture to its cache with Nehalem that includes a new faster but smaller L2 cache of only 256KB per core.  An 8MB L3 cache will be shared across all the cores (we are assuming that 8MB is for a quad-core processor as indicated in the image and that L3 cache sizes could change based on the chip design).  Along with this cache update is a new TLB (translation look-aside buffer) system that adds a 2nd level to the hierarchy for improved performance.

Intel IDF Preview: Tukwilla, Dunnington, Nehalem and Larrabee - Processors 34

The Nehalem platform is going to be very flexible thanks in large part to the QuickPath interconnect and will allow single and multi-socket systems.  You can see in the diagrams above that each CPU will support three channels of DDR3 memory, a first for the desktop market at the very least.  As you would expect, and as we know with the AMD integrated memory controller, latency is going to be greatly reduced on the Nehalem chips and with the bandwidth provided by three discrete channels of DDR3 the memory performance of the architecture should be impressive.

The new QPI that Intel has created is a point-to-point design much like HyperTransport and supports up to 25.6 Gb/s of bandwidth per link.

Intel IDF Preview: Tukwilla, Dunnington, Nehalem and Larrabee - Processors 35

The integrated memory controller gets its own slide here; DDR3 speeds up to 1333 MHz will be supported and you could have up to 3 DIMMs per channel, for a total of 9 modules.  The memory controller will support a variety of memory options such as registered DIMMs and regular unregistered modules and will apparently be “future scalable” to faster speeds. 

Intel IDF Preview: Tukwilla, Dunnington, Nehalem and Larrabee - Processors 36

After the 45nm generation is complete, Intel will be talking about the upcoming 32nm process and the chips built up on it.  Following the same “tick-tock” design you should expect Westmere to be a die-shrink and modest enhancement of Nehalem while Sandy Bridge will be a new architecture.

« PreviousNext »