Nehalem Details

Following
the “tick-tock” development cycle that has gotten Intel back on track
since the death of the NetBurst architecture, the upcoming Nehalem
architecture will take the 45nm process technology (not the chip
architecture) that was developed with Penryn and built a completely new
design.

Honestly,
not much more than had already been discussed about Nehalem was showing
at the briefing but it is always good to see the details extrapolated
once more. Nehalem CPUs will span from dual-core configurations up to
8-cores per processor with a quad-core design hitting the market
first.
The micro-architecture is similar to that of the current Core Architecture though it has been modified to support four simultaneous instructions and two-way simultaneous multi-threading – otherwise known as HyperThreading. The list of other new features is impressive though including Intel’s first integrated memory controller, a new HyperTransport-like QuickPath interconnect, a new L3 cache and dynamic power management for controlling all the cores and feature independently.
Nehalem up close
Intel’s move to an integrated memory controller and QuickPath is definitely an (unintentional) nod towards AMD; Intel will still say they were waiting for the right time to introduce the technology into the marketplace but at this point it’s nearly impossible not to see that AMD was really ahead of the game with K8. Unfortunately, AMD’s current execution of their architectural leadership has faltered.
The micro-architecture is similar to that of the current Core Architecture though it has been modified to support four simultaneous instructions and two-way simultaneous multi-threading – otherwise known as HyperThreading. The list of other new features is impressive though including Intel’s first integrated memory controller, a new HyperTransport-like QuickPath interconnect, a new L3 cache and dynamic power management for controlling all the cores and feature independently.

Nehalem up close
Intel’s move to an integrated memory controller and QuickPath is definitely an (unintentional) nod towards AMD; Intel will still say they were waiting for the right time to introduce the technology into the marketplace but at this point it’s nearly impossible not to see that AMD was really ahead of the game with K8. Unfortunately, AMD’s current execution of their architectural leadership has faltered.

Nehalem
will also be very modular in its design, allowing Intel piece together
some or all of its features into different chips allowing them to
address varying markets and price points. Just as a couple of examples
in this slide, Intel shows a quad-core processor with an L3 cache,
integrated DDR3 memory controller and single QPI connection. Another
option shows 8-cores and multiple QPI connections and would probably
address a high end server market.

As
I said above, the architecture is based around Intel’s current Core
Architecture in the Core 2 Duo, etc CPUs. They have increased in the
operations per clock from 3-way to 4-way and have adjusted the way the
cache system can be accessed by the cores. Intel also discussed a bit
about their enhanced branch prediction unit; these are all improvements
we expect to see in a generation-to-generation architecture change.

Also
as noted, a return to SMT is going to follow Nehalem to the market with
each core able to work on two software threads simultaneously. The SMT
in Nehalem should be more efficient that the HyperThreading we saw in
NetBurst thanks to the larger caches and lower latency memory system of
the new architecture.

Intel
is also bringing a three level architecture to its cache with Nehalem
that includes a new faster but smaller L2 cache of only 256KB per
core. An 8MB L3 cache will be shared across all the cores (we are
assuming that 8MB is for a quad-core processor as indicated in the
image and that L3 cache sizes could change based on the chip design).
Along with this cache update is a new TLB (translation look-aside
buffer) system that adds a 2nd level to the hierarchy for improved
performance.

The
Nehalem platform is going to be very flexible thanks in large part to
the QuickPath interconnect and will allow single and multi-socket
systems. You can see in the diagrams above that each CPU will support
three channels of DDR3 memory, a first for the desktop market at the
very least. As you would expect, and as we know with the AMD
integrated memory controller, latency is going to be greatly reduced on
the Nehalem chips and with the bandwidth provided by three discrete
channels of DDR3 the memory performance of the architecture should be
impressive.
The new QPI that Intel has created is a point-to-point design much like HyperTransport and supports up to 25.6 Gb/s of bandwidth per link.
The new QPI that Intel has created is a point-to-point design much like HyperTransport and supports up to 25.6 Gb/s of bandwidth per link.

The
integrated memory controller gets its own slide here; DDR3 speeds up to
1333 MHz will be supported and you could have up to 3 DIMMs per
channel, for a total of 9 modules. The memory controller will support
a variety of memory options such as registered DIMMs and regular
unregistered modules and will apparently be “future scalable” to faster
speeds.

After
the 45nm generation is complete, Intel will be talking about the
upcoming 32nm process and the chips built up on it. Following the same
“tick-tock” design you should expect Westmere to be a die-shrink and
modest enhancement of Nehalem while Sandy Bridge will be a new
architecture.