How the Barton Makes it Better
This content was originally featured on Amdmb.com and has been converted to PC Perspective’s website. Some color changes and flaws may appear.
The Barton core on the Athlon XP introduces an upgraded L2 cache to the processor line giving it a total of 512 KB L2 cache and 128 KB L1 cache for 640 KB cache total. This currently puts it at the largest cache for a desktop processor, including the latest Intel P4s with Hyper Threading.While it goes without saying that the added cache on the Barton is going to increase performance, it is important to know exactly how it will benefit user programs and what may be the next item that needs to be addressed.
Image courtesy aggregate.org
Processor cache is a very, very complicated subject that would take more time than I have here to fully describe, but I’ll see if I can get the high points to help with this discussion. The Athlon XP (pre-Barton) has 128 K of L1 cache and 256 K of L2 cache. The L1 cache is broken up into a 64 K data and a 64 K instruction cache. When a program asks for data from the processor, the CPU first checks in the L1 cache to see if it’s available and ready to go. If it is not, it then steps down and checks the L2 cache. Finally, if all else fails, the processor has to reach out into main memory to fetch the information. L1 cache is much faster than L2 cache, but is more costly. Main memory (what we call DRAM) is much slower than L2 cache, by several orders of magnitude. Of course, the faster the memory can read the better, so we would prefer to have the processor find the data in L1 cache. However, at only 64 K, that’s not going to happen all the time, so the L2 cache is the fallback. At 256 K, it also isn’t very big, but does better than L1. Finally, memory is our last choice as it slows down our processing extremely.
So, obviously, increasing the L2 cache from 256 K to 512 K is going to help the Barton cores perform better as they are less likely to have to access the main memory of the computer. How much this actually helps is a question that the benchmarks are going to have to show us.
All of this coincides with what I think the next big step in processor design is going to go, and that is the issue of TLBs (translation look-aside buffers). TLBs are used when a processor wants to access any memory, L1, L2 or main. Because of the way memory addresses work, the processor needs to look up in a table the “reference” for where some data is actually located. It does so by looking in a TLB table and using the information there to find the exact location. However, you’ll notice that the TLBs for processors are now only at around 128, depending on the brand you are looking at. If a processor has more addressable memory than can be represented by the TLBs, it often occurs that some of the TLBs are located in main memory.
The problem with that happens when the processor tries to access some memory, finds that the TLB entry isn’t in local cache, and must access main memory to find it. Then, if the TLB entry points to main memory AGAIN, we are seeing nearly a 200% slowdown in memory accesses because of that. Adding more TLBs may or may not be the answer to all this.
This last part was very simplified and vague to keep it short to write and potentially easy to understand. I recommended reading the Patterson/Hennessy book titled Computer Organization and Design.