Memory Basics (cont’d)

What’s new, pussycat?

Enter DDR-2

Second generation double date rate memory (DDR-2), expected to start at 400 MHz then go to 533 MHz and 667 MHz, should soon begin replacing DDR-1 (or DDR as we know it). DDR-2 seeks to increase the total memory bandwidth available to the system. This will be accomplished via increased clock frequencies in addition to streamlining the protocols used by the system to make memory reads and writes. According to the JEDEC standard, DDR-2 will have 240 pins and will offer reductions in power consumption and heat output, which are two problems that grow larger as systems carry more and faster memory. In a similar fashion to the migration from SDRAM to DDR, DDR-2 sacrifices latency. An interesting tidbit on the side is that Intel’s P4 architecture, using all kinds of optimizations, will be hurt less than AMD by the high latencies of DDR II. We didn’t complain much last time, so maybe we won’t this time either?

DDR-2 will likely be the dominant type of memory in desktop space for several years as DDR-1 is/was. DDR-II won’t arrive in quantity until the second half of 2004, however.

QDR and XDR

Quad Data Rate Memory (QDR DRAM) – Instead of two data samples per clock cycle, QDR sends four data samples per cycle. QDR is not a JEDEC standard, but instead has been developed as a memory timing technology by Kentron. Kentron has said that QDR technology can leverage existing DDR-1 technology. Note that QDR isn’t simply 2x the speed of standard DDR. Instead, Kentron and VIA propose using a single QDR channel to achieve the performance of dual-channel DDR. (DDR-2 is still on VIA’s road map)

XDR DRAM – getting catchy? XDR DRAM stands for eXtreme Data Rate DRAM, and is the final name for Rambus’s “Yellowstone” technologies which have been announced in pieces over time. XDR brings all of these formerly announced technologies under one big umbrella, which will be marketed as a high-bandwidth memory solution. XDR is effectively a hybrid of DDR and Rambus DRAM, designed to combine the best elements of both. Rambus claims that their mid-range XDR memory module is 8x faster compared to today’s DDR-400. By “faster”, they are referring to the module clock speed, along with how many bits can be transmitted per clock cycle. XDR modules are not in production yet, and are not scheduled to go into full-scale production until 2006.

DDR Memory Speeds

The speed of DDR is usually expressed in terms of its “effective data rate”, which is twice its actual clock speed. PC3200 memory, or DDR400, or 400 MHz DDR, is not running at 400 MHz, it is running at 200 MHz. The fact that it accomplishes two data transfers per clock cycle gives it nearly the same bandwidth as SDRAM running at 400 MHz, but DDR400 is indeed still running at 200 MHz.

Actual clock speed/effective transfer rate => specification

100/200 MHz => DDR200 or PC1600
133/266 MHz => DDR266 or PC2100
166/333 MHz => DDR333 or PC2700
185/370 MHz => DDR370 or PC3000
200/400 MHz => DDR400 or PC3200
217/433 MHz => DDR433 or PC3500
233/466 MHz => DDR466 or PC3700
250/500 MHz => DDR500 or PC4000
267/533 MHz => DDR533 or PC4200
283/566 MHz => DDR566 or PC4500

So how do they come about those names? Well, the industry specifications for memory operation, features and packaging are finalized by a standardization body called JEDEC. JEDEC, the acronym, once stood for Joint Electron Device Engineering Council, but now is just called the JEDEC Solid State Technology Association.

The naming convention specified by JEDEC is as follows:

  • Memory chips are referred to by their native speed. Example, 333 MHz DDR SDRAM memory chips are called DDR333 chips, and 400 MHz DDR SDRAM memory chips are called DDR400.
  • DDR modules are also referred to by their peak bandwidth, which is the maximum amount of data that can be delivered per second. Example, a 400 MHz DDR DIMM is called a PC3200 DIMM. To illustrate this on a 400 MHz DDR module: Each module is 64 bits wide, or 8 Bytes wide (each byte = 8 bits). To get the transfer rate, multiply the width of the module (8 Bytes) by the rated speed of the memory module (in MHz): (8 Bytes) x (400 MHz/second) = 3,200 Mbytes/second or 3.2 Gbytes/second, hence the name PC3200
To date, the JEDEC consortium is yet to finalize specifications for PC3500 & higher modules. PC2400 was a very short lived label applied to overclocked PC2100 memory. PC3000 was not and will not ever be an official JEDEC standard.

Processors and Bandwidth

The front side bus (FSB) is basically the main highway or channel between all the important functions in the motherboard that surround the processor through which information flows. The faster and wider the FSB, the more information can flow over the channel, much as a higher speed limit or wider lanes can improve the movement of cars on a highway. As with the FSB, a low speed limit or narrower lanes will retard the movement of cars on the highway causing a bottleneck of traffic. Intel has been able to reduce the FSB bottleneck by accomplishing four data transfers per clock cycle. This is known as quad-pumping, and has resulted in an effective FSB frequency of 800 MHz, with an underlying 200 MHz clock. AMD Athlon XPs, on the other hand, must be content with a bus that utilizes different technology, one that utilizes both the rising and falling sides of a signal. This is in essence the same double data rate technology used by memory of the same name (DDR), and results in a doubling of the FSB clock frequency. That is, a 200 MHz clock results in an effective 400 MHz FSB.

Processors have a FSB data width. This data width is much like the “lanes on a highway” that go in and out of the processor. When the first 8088 processor was released, it had a data bus width of 8 bits and was able to access one character at a time (8 bits = 1 character/byte) every time memory was read or written. The size in bits thus determines how many characters it can transfer at any one time. An 8-bit data bus transfers one character at a time, a 16-bit data bus transfers 2 characters at a time and a 32-bit data bus transfers 4 characters at a time. Modern processors, like the Athlon XP and Pentium 4, have a 64-bit wide data bus enabling them to transfer 8 characters at a time. Although, these processors have 64-bit data bus widths, their internal registers are only 32 bits wide and they’re only capable of processing 32 bit commands and instructions while new AMD64 series of processors are capable of processing both 32 bit and 64 bit commands and instructions.

When talking memory, bandwidth refers to how fast data is transferred once it starts and is often expressed in quantities of data per unit time. The peak bandwidth that may be transmitted by an Athlon XP or a Pentium 4 is the product of the width of the FSB and the frequency it runs at. To illustrate:

Athlon XP “Barton” 3200+ — 400FSB
64(bits) * 400,000,000(Hz) = 25,600,000,000 bits/sec
(25,600,000,000/8) / (1000*1000) = 3200 Mb/sec

Intel Pentium 4 “C” 3.2 GHz — 800FSB
64(bits) * 800,000,000(Hz) = 51,200,000,000 bits/sec
(51,200,000,000/8) / (1000*1000) = 6400 Mb/sec

These figures are theoretical and as many may know theory isn’t always put into practice. There’s a difference between peak bus bandwidth and effective memory bandwidth. Where peak bus bandwidth is the product of the bus width and bus frequency, effective bandwidth takes into consideration others factors such as addressing and delays that are necessary to perform a memory read or write. The memory could very well be capable of putting out 8 bytes on every single clock pulse for an indefinitely long time, and the CPU could likewise be capable of consuming data at this rate indefinitely. The problem is that there are turnaround times (or delays) in between when the processor places a request for data on the FSB; when the requested data is reproduced by RAM and when this requested data finally arrives for use by the CPU. So, potential peak bandwidth is very rarely, if ever, realized.

DDR Dual Channel

Most of today’s mainstream chipsets are using some form of dual channel to supply processors with bandwidth. The nForce and nForce2 are, at this time, the only two chipsets to supply dual-channel goodness for the Athlon XP. The original nForce was not on the same performance and stability level as the competitor VIA’s chipset was, but the new and improved dual-channel DDR400 nForce2 has been a smash success — in fact, is today’s de facto choice for performance-minded / overclocker AMD desktop buyers. VIA is now about to release a Dual Channel chipset for the Athlon XP/Duron family called the KT880. Take note that the memory isn’t dual channel, the platform is. In fact there is no such thing as dual channel memory. Rather, it is most often a memory interface composed of two (or more) normal memory modules coordinated by the chipset on the motherboard, or in the case of the AMD64 processors, coordinated by the integrated memory controller. But for the sake of simplicity, we refer to DDR dual channel architecture as dual channel memory.

The nforce2 platform has two 64 bit memory controllers (which are independent of each other) instead of just a single controller like other chipsets. These two controllers are able to access “two channels” of memory simultaneously. The two channels, together, handle memory operations more efficiently than one module by utilizing the bandwidth of two modules (or more) combined. By combining DDR400 (PC3200) with dual memory controllers, the nForce2 could offer up to 6.4 GB/sec of bandwidth in theory. However, this extra bandwidth produced by dual channel cannot be fully utilitized by the Athlon XP and Duron family (K7) of processors. Data(bandwidth) will reach these processors no sooner than the system bus (FSB) allows them, and the processor therefore cannot derive an advantage from memory operating faster than DDR266 when operating on a 133/266Mhz FSB, DDR333 with a 166/333Mhz FSB or DDR400 at 200/400Mhz FSB even in single channel mode. Visualize a four lane highway, symbolizing your Dual Channel configuration. As you go along the highway you come up to a bridge that is only 2 lanes wide. That bridge is the restriction posed by the dual-pumped AMD FSB. Only two lanes of traffic may pass through the bridge at any one time. That’s the way it is, with the K7 processors and Dual Channel chipsets.

In case you’re wondering, the K in K7 stands for Kryptonite later changed to Krypton to avoid copyright infringement. Yes, that very same fictional element from comic books that could bring the otherwise all-powerful Superman to his knees. Speaking of which, Intel’s P4 architecture is, in contrast, designed to exploit the increased bandwidth afforded by dual channel memory architectures. The 64-bit Quad Pumped Bus of the modern Pentium 4 CPU working at 800MHz, in theory, requires 6.4GB/s of bandwidth. This is the exact match of the bandwidth produced by the Intel i875 (Canterwood) and i865 (Springdale) chipset families. The quad pumped P4 FSB seemed like drastic overkill in the days of single channel SDR memory, but is paying handsome dividends in today’s climate of dual channel DDR memory subsystems. This is one lasting and productive legacy of Intel’s RDRAM efforts. As implemented on the P4 RDRAM was also dual channel architecture, and mandated the quad-pumped FSB for its extra bandwidth to be exploited. This factor continues to serve the P4 well in the dual channel DDR era we are currently in, and allows P4’s greater memory performance than all other PC platforms, save the new AMD Athlon64 FX with all its new bells and whistles.

The Athlon 64 FX processor has a fully integrated DDR Dual Channel memory controller providing a 128-bit wide path to memory and therefore eliminating the need for a Dual Channel interface on the motherboard which traditionally was always located in the Northbridge. The old term front-side bus has always represented the speed at which the processor moves memory traffic and other data traffic to and from the chipset. Since the AMD64 processors has the memory controller located on the processor die, that memory subsystem traffic no longer has to go through the chipset for CPU-to-memory transfer. Therefore, the old term “front-side bus” does no good as it is not applicable anymore. With AMD64 processors, the CPU and memory controller interface with each other at full CPU core frequency. The speed at which the processor and chipset communicate is now dependent on the chipset’s HyperTransport spec, running at speeds of up to 1600 MHz. Although the P4 (800fsb variety) and the A64 FX 940 pins, both share the same theoretical peak memory bandwidth of 6.4GB/sec, the Athlon FX realizes significantly more throughput due mainly to it’s integrated memory controller which drastically reduces latency. Even so, it still suffers from the required use of registered modules which are slower than regular modules. The upcoming Athlon 64 / A64 FX processors designed for Socket 939 will be free from this major drawback and will also feature Dual Channel memory controllers. One negative, though, of having the memory controller integrated into the processor is that to support emerging memory technologies, like DDR-2 for example, the controller has to be redesigned and the processor needs to be replaced.

What do these terms mean?

Parity – Parity is form of error checking. Non-parity is “regular” memory — it contains exactly one bit of memory for every bit of data to be stored. 8 bits are used to store each byte of data. Parity memory adds an extra single bit for every eight bits of data, used only for error detection. So with parity modules, 9 bits of data are used to store each byte. This extra chip detects if data was correctly read or written, however, it will not correct any errors that may have occurred.

ECC – This stands for error correcting circuits, error correcting code, or error correction code. These modules go beyond simple parity checking. They also have an extra chip (or two, depending on how much chips total the module has) that not only detects errors but also corrects the error (depending on type) on the fly. When this correction takes place, the computer will continue without a hiccup; it will have no idea that anything even happened. However, if you have a corrected error, it is useful to know this; a pattern of errors can indicate a hardware problem that needs to be addressed. Chipsets allowing ECC normally include a way to report corrected errors to the operating system, but it is up to the operating system to support this.

Registered & Unbuffered – Registered modules contain a ‘register’ that helps to ensure data is handled properly. Registered modules are therefore slower than unbuffered modules. They are generally used in mission critical machines and machines that require large amounts of memory. The Opteron series of AMD processors uses registered DDR. The Athlon64 FX (940 pins) inherits its architecture from the Opteron 100 series, thus it too requires registered modules to function.

Registered memory must be supported by the motherboard and cannot be mixed with “Unbuffered” modules. Buffered memory is basically the same as registered memory, but the term is used for older types of memory. Unbuffered or standard memory modules do not have a register. They are cheaper and are the popular choice for home computers.

« PreviousNext »