Intel Quick Path Interface and New Memory Controller
One of the features that Intel HAS been talking about for a while is the move away from the front-side bus architecture and to something called Intel’s Quick Path Interconnect. Previously known only as CSI, common system interface, QuickPath is Intel’s answer to AMD’s HyperTransport technology and it performs a very similar function.

You can see that even though Intel has improved the FSB interface to the north bridge (and thus the CPU) over the years, we all knew that the FSB was a limiting factor in memory bandwidth and the lower of memory latency.

Starting with Nehalem and moving forward Intel’s processors will feature a direct connect architecture that is point to point and will transmit data from socket to socket as well as from the CPU to the chipset all while scaling nicely as the number of CPUs and QPI links goes up. Part of the reason the QPI technology was needed on Nehalem was due to the new integrated memory controller on the processor. As AMD introduced many years ago, an IMC allows for higher peak memory bandwidth and lower memory latency though Intel is taking it another step up by offering a three-channel DDR3 memory controller from each CPU. The QPI is also a requirement of efficient chip-to-chip communications where one CPU might need to access datat that is stored in memory on the other processors memory controller.

The QPI design supports 6.4 GigaTransfers a second or 12.8 GB/s of bandwidth in each direction for 25.6 GB/s total bandwidth between two points. Future versions of QPI will scale up to faster speeds as well. You can also tell in the above four-CPU diagram that QPI will scale well with as many as four CPUs – each processor in this case would require four total QPI connections and would be only one hop from any other CPUs memory.
The Intel Nehalem Integrated Memory Controller (IMC) is actually pretty scalable in its own right – besides offering extreme high bandwidth and low latency the number of memory channels can be varied, both buffered and non-buffered memories are supported and memory speeds can be adjusted all based on the market that the processor will be targeted for. Low cost cores with only dua channel memory should cost considerably less than top end three-channel systems.
At launch, the DDR3 memory controller located on Nehalem will only OFFICIALLY support DDR3-1066 memory speeds. While that is pretty lame, I was told on numerous occasions that the memory controller will run at speeds of DDR3-1600-2000 but official supports stops with JEDEC. The IMC in Nehalem will also force Intel to use the NUMA (non-uniform memory access standard) since memory will be stored in different areas (not just attached to the north bridge) for the first time in Intel’s desktop processors.
The Intel Nehalem Integrated Memory Controller (IMC) is actually pretty scalable in its own right – besides offering extreme high bandwidth and low latency the number of memory channels can be varied, both buffered and non-buffered memories are supported and memory speeds can be adjusted all based on the market that the processor will be targeted for. Low cost cores with only dua channel memory should cost considerably less than top end three-channel systems.
At launch, the DDR3 memory controller located on Nehalem will only OFFICIALLY support DDR3-1066 memory speeds. While that is pretty lame, I was told on numerous occasions that the memory controller will run at speeds of DDR3-1600-2000 but official supports stops with JEDEC. The IMC in Nehalem will also force Intel to use the NUMA (non-uniform memory access standard) since memory will be stored in different areas (not just attached to the north bridge) for the first time in Intel’s desktop processors.

In this graph Intel shows us that Nehalem’s memory latencies are damn good – the “local” result shows the latency when the requested memory resides on the same processor memory controller as is doing the calculation. The “remote” score shows how performance is affected when the data is on the other CPUs memory and must be retrieved via NUMA specs. According to this, even with that added delay, the Nehalem CPU beats the Harpertown results.