The C19 North Bridge
With the introduction of the NVIDIA nForce4 SLI Intel Edition chipset, NVIDIA has returned to a two chip solution for core logic. This is due to the logic and transistor requirements of putting a memory controller back in the north bridge (or SPP as NVIDIA calls it).
The C19 SLI north bridge from NVIDIA
As you can see from the photo above, the revision that we received was the A02 silicon. I have heard rumors that at least one more revision, A03, exists that completes support for dual core processors. More on that later though. The NF4-Intel SPP is built on the 0.13 micron process and sports a mere 61 million transistors. The term “mere” is relatve though, as NVIDIA’s GPUs tend to run quite a bit larger than that.
One thing became obvious very quickly in my testing — this north bridge gets hot! With 61 million transistors, and a lot of work being done by a complex memory controller, this is by far the hotest core logic I have tested. Just take a look at this heatsink!
Yeah, that’s the north bridge under that big green heatsink
The heatsink is actually bigger in size than the one used on the Dothan core Pentium M processor; and it still manages to get hot. Perhaps this kind of heat is a requirement of the amount of tweaking done to the core, but it should make overclockers raise an eyebrow or two.
The NF4 SLI SPP chip is responsible for only two functions: the PCI Express controller and the memory controller. These also happen to be the two most important features for enthusiasts, so it should come as no suprise to say that NVIDIA spent a lot of development time here.
As we mentioned on the features list page, the SPP has a total of 20 lanes of available PCI Express. 16 of these are given to the graphics card, either via a single x16 slot or two x8 links for SLI graphics setups. The rest are available as x1 PCIe slots only. This pretty much remains the same in comparison to the AMD-based nForce4 SLI chipset.
But in the memory controller, NVIDIA has put in a lot of work. They have dubbed it “DualDDR2” architecture and it offers the enthusiast a very high level of bandwidth, a low amount of latency, and a lot of ability and room to tweak. The chipset is the first to officially support DDR2-667 speeds for a maximum theoretical bandwidth of 10.6 GB/s. Interestingly, the NF4-Intel memory controller will operate correctly in 128-bit mode whether or not the two memory channels are populated symmetrically or assymetrically. So you no longer need to add memory in pairs to get improved memory performance, though the white paper NVIDIA supplied the media with did admit that upgrading in pairs improves memory performance quite a bit more.
NVIDIA nForce4 SLI Intel Edition Memory Controller
One thing missing from the NF4-Intel chipset that other P4 chipsets have adopted is support for the original DDR1 memory. To quote NVIDIA directly on this issue, they “chose not to support both memories because supporting legacy DDR memory required multiple design compromises that would affect system performance.” This also makes logical sense as the NF4 SLI Intel Edition chipset is only being aimed at the enthusiast market that want the latest technologies and best performing parts, which DDR2 at 667 MHz is becoming.
The NF4-Intel chipset is the first DDR2 chipset to operate at 1T timings as well. This is due to NVIDIA’s inclusion of a dedicated address bus for each memory module allowing the load of the memory addressing to be evenly distributed. A 1T timing is only possible when both the addresses and commands are placed on the memory bus by the memory controller and the DRAM devices can “latch” onto them in a single memory cycle. Otherwise, in a 2T setting, the addresses and commands are sent on one cycle and the DRAMs “latch” on the following cycle, basically adding an entire cycle of latency to memory requests. NVIDIA supplied the below graphic to illustrate this idea:
NVIDIA 1T Address timing vs 2T Address timing
As you can see from this, running at 2T rather than 1T is effictively adding a clock to the CAS latency and when applications that are random in nature, this can be easily visible.
This new memory controller also introduces us to a new revision of NVIDIA’s DASP (dynamic adaptive speculative preprocessor), version 3.0. This logic is much like a processor prefetch mechanism in that it tries to read the data from memory and have it available before it is actually asked for in order to improve memory latencies. The big problem for a chipset-based prefetch logic is that what it actually does is predict the behavior and logic of the processors prediction unit. You can see how that would become very messy, very easily. Throwing HyperThreading and support for multiple cores and threads and you get even more problems for a chipset-based preprocessor. NVIDIA claims they have designed their DASP to be very effective with Intel’s processors, both dual core and single, using multiple preprocessor units and a central arbirter that decided which prefetches are going to be issued and in what order, between the chipset, CPU and GPU.
The final technological change in the memory controller is NVIDIA’s QuickSync Technology, which is a “patent-pending” technology, and thus less open to public viewing. The problem it attempts to address is that of synchronization between the front-side bus clock and the memory clock. These synchronization circuits are pretty straight forward when working with two buses running at the same frequency, but when you beging to operate at different speeds, like the NF4-Intel chipset does with DDR2-667 memory support, you can actually add latency and slow down the memory if things are done incorrectly.
NVIDIA supplied this diagram below, that shows that with overclocking, the problem can be compounded:
The overclocking problem NVIDIA QuickSync is supposed to fix
How does NVIDIA’s QuickSync technology work? I really don’t know except that NVIDIA states that it “speeds up the internal paths between the FSB clock domain and the memory clock domain as the FSB bus speed and/or the memory bus speed increases.” It all adds up to NVIDIA just trying to get the data to and from the processor as quickly as possible.