Bandwidth and Overclocking
It’s all About the Bandwidth, Bay-Bee!
As this dual graphics trend continues with both SLI and CrossFire appearing to be around for quite some time to come, a new bottleneck has appeared that most thought wouldn’t have for many years to come. When x16 PCI Express video cards were first introduced, the bandwidth it offered was light years ahead of anything AGP had previously given us. But, a single card couldn’t even come close to maxing out the full 8GB/s of available bandwidth, so most just wrote off the move to PCIe as future-proofing the industry. With both SLI and CrossFire though, the need to move data between two video cards has far exceeded the need to move data from memory to the cards. Let’s elaborate.
When running in multi-rendering mode, a full frame of data must be passed between the two video cards every other frame. To put that into perspective, a single 1600×1200 32-bit frame is about 7.3 MB of raw data, giving the card only 17ms to render and transfer a frame if the system is going to reach a 60 FPS speed. This means that transfer speeds between graphics cards get more and more important as the resolution being displayed (and thus the amount of data) increases. NVIDIA’s SLI has an internal connection between the cards to enable this transfer to occur, but ATI’s X1800 and X1900 line of CrossFire cards is dependent on an external connection to transfer a lot of the data.
ATI was quick to show though that with connector-less configurations, as on the X1600 and X1300 CrossFire setups, the bandwidth between the two cards is obviously even more important. In this case, all of the data must be moved along the PCI Express bus (not along an external connection as with the X1800 and X1900) and therefore faster the bus the better. But the math gets tricky here as talking to the red guys and the green guys gives us two different answers.
ATI claims that with their new XPress 3200 solution (two x16 PCIe slots having full x16 lanes of bandwidth between them with a XPress 3200 chipset as the single arbiter) has faster throughput than NVIDIA’s SLI X16 solutions that depend on two chips to get the full x16 bandwidth. In fact, ATI claims their solution exposes as much as twice the total bandwidth over the course of data transfer between the two cards. NVIDIA discredits this, however, saying their solution has a x16 pathway between the two cards as well, and the need to go between the northbridge and south bridge when communication across the bus doesn’t add any additional speed deficits.
Logic tells us that, in theory, a single chip should be faster than two chips but we’ll wait for more information before declaring either the victor here.
Even ATI admitted to us that the XPress 200 CrossFire implementation was not built for the rigors of multi-GPU processing in mind. As I mentioned at the outset, the XPress 200 CrossFire chipset and GPUs were basically hack jobs, but this time things are different they claim. Both the R580 GPU and the RD580 chipset were designed with CrossFire in mind and as such the communication between them all has been optimized for faster transfers and more efficient use of the additional bandwidth the XPress 3200 provides.
One of the key benefits of the improved PCIe bus concerns the Catalyst driver and the profiles that were necessary in the past. Previously on the XPress 200, when a game engine required intense card-to-card transfer, the ATI driver team had to find a more efficient way to move the data or simulate the movement of that data in order to avoid performance loss on the unoptimized x8 PCIe connections. But with the new 3200 chipset, the full x16 PCIe connections means that ATI will not need to modify their drivers as much which means increased CrossFire speed and compatibility.
Radeon XPress 3200 — Now with more Overclocking!
When I say that the XPress 3200 chipset was built from the ground up for overclocking, there are various design decisions that make up this feature. First, RD580 was a clean sheet design, meaning that this simply isn’t an upgrade of the RD480 chipset with more PCI Express lanes plopped down on it. Remember, there is a full 40 lanes of PCIe in this tiny northbridge, while NVIDIA’s SPP only features about half of that and uses a second MCP chip to fuel the second x16 PCIe slot. The new chipset took about 8 months to design, ATI told us, and that the first silicon produced took only about 24 hours before CrossFire was running stable.
The XPress 3200 chipset is being manufactured on the TSMC 0.11 micron process, which automatically offers additional headroom over the original 0.13 micron process that previous chipsets used.
The first design decision that ATI’s engineers made for improving overclocking was the move to a flip chip design for the circuit. The flip chip offers increased electrical connection strength compared to older BGA or wire boding chips. Secondly, the ATI design team also set out with the mindset to ‘overspec’ the RD580 by setting a higher goal for their acceptable clock rates in the design process. While ATI wouldn’t actually give us a number they were aiming for, the fact is we can move well beyond the default speeds on all the buses without a problem as you will see shortly.
The chip was made for highly individualized changes to clocks so that users can modify settings without adversely affecting other components. In fact, each of the x16 GPU slots can run at independent PCIe speeds if the motherboard vendor and user decided to do it. Also, when over clocking the GPU PCIe lanes, the remaining eight PCIe lanes can still maintain the standard clock rates. The various HyperTransport links can also be changed independently and the internal CPU clock maintains a constant 1:1 ratio with its HT link so that increases in the clock results in exactly equal increases in CPU speed.
The Final Result
The chipset comparison table below summarizes what we have talked about here in this preview of the Radeon XPress 3200 chipset.
The table is a bit skewed as many of the features included on NVIDIA’s nForce4 SLI X16 chipsets are conveniently not listed such as the Gigabit Ethernet.