The Register have put together a nice graphic and table displaying current storage technologies and how they relate to each other. They constructed the graph to demonstrate the major boundaries in storage, between cache/memory, local storage and external storage and how these are going to move thanks to new technology. NVMe-over-fabric will enable companies to utilize external storage at latencies lower than internal storage that still uses SATA or SAS, with only pure PCIe local storage outpacing its potential. X-Point, assuming it lives up to the hype, will blur the line between local storage and memory/cache storage, offering latency previously only seen in system memory or on-die cache.
They also provide a table to give you some rough ideas how this translates between storage media, normalizing it a theoretical task which would take L1 cache 1 second to access, this can make it somewhat easier to comprehend for some than nanoseconds.
"Two technology changes are starting to be applied and both could have massive latency reduction effects at the two main storage boundary points: between memory and storage on the one hand, and between internal and external, networked storage on the other."
Here is some more Tech News from around the web:
- Alphabet's Nest To Deliberately Brick Revolv Hubs @ Slashdot
- Meet Jide's Remix OS: Android on the desktop done right @ The Inquirer
- Google pushes April Android security update to Nexus devices @ The Inquirer
- FreeBSD 10.3 lands @ The Register
- Quinones and graphite make green battery @ Nanotechweb
- A One Year Redux On The Basement Computer Room For Benchmarking 50+ Systems Daily @ Phoronix
- AMD Details Bristol Ridge AM4 Performance @ Hardware Canucks
- Samsung starts mass producing 10nm-class NAND chips @ The Inquirer
“X-Point, assuming it lives
“X-Point, assuming it lives up to the hype, will blur the line between local storage and memory/cache storage, offering latency previously only seen in system memory or on-die cache.”
X-point will not be anywhere near the latency of on-die cache. The access time for an SRAM cell is essentially zero compared to a DRAM cell. A DRAM cell stores charge in a capacitor which must be read with a sense amplifier. The time that this takes hasn’t changed much even though the interface speeds have increased significantly. DRAM achieves high speed by reading a large row of DRAM cells into an SRAM type cache buffer. Surrounding addresses can then be sent out with low latency.
With SRAM, the access time is determined by how long it takes to figure out the correct storage location and transfer the data back rather than any cell latency. The larger the storage, the longer it takes to determine the storage location. This is why we have multi-level caches. The L1 is very small and very fast, the L2 is larger and slower, and so on out the memory hierarchy. Cache design is not simple, and I suspect that good cache design has been a large part of Intel’s lead for quite a while. Also, going off chip adds a significant amount of latency when we are talking about nanosecond scales. Reading from the currently open page in a DRAM doesn’t involve any DRAM cell read since it will be in an SRAM type buffer, but it still takes significantly longer to access than on-die cache. An off chip SRAM wouldn’t even be able to compete with on-die caches, so it is ridiculous to imply that x-point would.
This article is also a good example of why increasing the speed of storage often doesn’t make much difference in performance. Even though flash increases the speed of external storage by an order of magnitude over spinning rust, it is still significantly slower than speeds closer to the professor in the hierarchy. The system is still optimized to access memory farther out in the hierarchy as little as possible. We currently seem to have reached the point where even DRAM speed isn’t that important anymore. For most applications, there is very little difference between the fastest and the slowest DDR4. This why I wouldn’t bother going with higher speed DRAM or faster SSDs at the moment. I would still buy DDR4, I just wouldn’t spend extra money for over clocked memory or expensive speed grades. The higher speed SSDs probably aren’t worth it either unless you actually run an application which is sensitive to storage speed. Most consumer applications are not.
While these new non-volatile storage technologies are interesting, I don’t think they change the memory hierarchy much. It will allow for a more instant on behavior from sleep though and smaller form factors. In my opinion, the major memory hierarchy change coming is due to HBM. The last time we had a major change in the way memory connects to the system was when AMD moved the DRAM controller onto the processor die with K8 in 2003, and that was a smaller change in comparison. If you have an APU with 16 GB or 32 GB of HBM2 (maybe possible in 2018 or so), then do you need external DRAM? Consumer level applications probably will not. Even if you need some extra memory space, then it will probably be a step out in the hierarchy compared to current DRAM. This means that it doesn’t need to be as fast or as low latency as current DRAM since it would be accessed more like an SSD page file. The HBM would act as a cache, so external memory would be accessed less frequently and in larger chunks (bandwidth sensitive rather than latency sensitive).
For systems requiring more on-die speed cache, they could even put an extra SRAM die on the interposer for large L3 or L4. It wouldn’t be as fast as on-die cache since the latency for going off die on an interposer is non-zero. It can be much less than going off package through a PCB though. This will allow for making smaller die with better yields. The only thing that makes the large cache Xeons doable is that they can sell most of them as salvaged parts. They can sell many different versions of a large multi-core CPU at varying core counts, varying clock speeds, varying amounts of cache, and even varying power consumption. You can’t do that with GPUs. It would cause mass confusion if Nvidia tried to market a dozen different 980 GPUs. You get maybe 4 versions of each GPU, 2 for desktop and 2 for mobile.