A few weeks back, I was briefed on Micron’s new SolidScale Architecture. This is essentially Micron’s off-the-shelf solution that ties together a few different technologies in an attempt to consolidate large pools of NVMe storage into a central location that can then be efficiently segmented and distributed among peers and clients across the network.
Traditionally it has been difficult to effectively utilize large numbers of SSDs in a single server. The combined IOPS capabilities of multiple high-performance PCIe SSDs can quickly saturate the available CPU cores of the server due to kernel/OS IO overhead incurred with each request. As a result, a flash-based network server would be bottlenecked by the server CPU during high IOPS workloads. There is a solution to this, and it’s simpler than you might think: Bypass the CPU!
Read on for our deeper dive into Micron's SolidScale technology!
Mellanox has been working to advance and converge NVMe over fabric (NVMf), remote direct memory access (RDMA), and RDMA over Converged Ethernet (RoCE). To crudely merge all of the above definitions together, it essentially means that a client can directly access the NVMe SSDs of a server, via Ethernet, in such a way that bypasses the CPU of the remote system, eliminating the aforementioned bottleneck. The fiber speeds range from 40 to 100 GbE, so these are certainly fast data links.
Where Micron comes in is that they are incorporating Mellanox’s tech into a packaged solution, complete with their own management interface and everything else you would need to get up and running:
Diving a bit into how all of this works, we can also discover some of the benefits seen by the added flexibility of having all of your flash-based storage on a common fabric/network. Imagine you have installed the three 2U boxes pictured above and filled the first 8 bays of each drawer with U.2 SSDs. You would use Micron’s dashboard software to create some volumes of the desired parity levels (striping and/or mirroring (RAID-0, 1, 10) are currently supported). Once the clients (which can themselves be servers) have compatible NICs and Micron’s driver installed, they can connect to the shared volumes.
The trick here is that those shared volumes are not mounted in the way you would normally expect. Since the server’s CPU is no longer in the IO path, those SSDs are all presented to the network individually for direct access, meaning it is the client machine that handles the RAID striping. The client simply reaches out to the network for the specific SSDs it needs to access and requests the RAID stripes from those individual SSDs directly.
It does not matter which physical SolidScale drawer those SSDs are installed in, which gives an incredible amount of flexibility when dealing with unexpected failures. Say one of the 2U systems has a motherboard failure. Clients would still have redundancy since the other set of their drives would have been installed in a different drawer, but the data center manager can simply move the SSDs from the failed drawer to any of the free slots of the remaining functional machines and the effect would be the same as if the failed server was brought back online. In this degraded/partially recovered mode I walked through, the system could deal with a subsequent SSD failure and since all SSDs were made available, no data would be lost. The clients only care if the SSDs are available on the network – they don’t care which server the data passes through. Obviously, in this scenario, we have lost some of the failure domain (another drawer failure may take arrays offline completely), so you’d want to get that failed motherboard swapped ASAP, but that first layer of flexibility in recovery would not have been possible with a typical failover server setup.
To briefly cover expected performance, you are of course going to see a performance hit compared to having the flash local, however, the Micron solution claims to add only ~10us per IO (plus any distance between far away servers, laws of physics still apply here). The above chart shows some of Micron’s own tests showing the typical performance hit seen across varying applications. Doesn’t seem to bad given the benefits of centralizing and therefore managing large amounts of flash.
We’re going to try and get some of our own performance (Latency Percentile) figures on this gear in the future, but until then I will leave you with Micron’s SolidScale press release:
Micron Unleashes the Full Power of NVMe Storage, Unlocking Unused Capacity and Performance
Micron SolidScale Platform Architecture Ushers in the Era of Shared Accelerated Storage for Data Intensive Workloads and Next-Generation Cloud Native Applications
New York, NY – May 3, 2017 – Micron Technology, Inc. (Nasdaq: MU) today introduced the Micron SolidScale architecture, an integrated platform that delivers breakthrough low-latency and high performance access to compute and storage. The Micron SolidScale architecture provides customers with the agility to deploy next-generation, cloud-native applications while supporting legacy applications that run the enterprises of today — and tomorrow. From online transaction processing, to virtual platforms and analytics, to machine learning, Micron’s innovative architecture not only delivers data quickly due to its extremely high throughput, but it delivers faster time to results because of its unprecedented low latency.
“We estimate that companies using NVMe SSDs deployed in application servers today are on average using less than 50% percent of their IOPS and capacity. With the new Micron SolidScale architecture, capacity is shared across application servers, unlocking capacity customers have already paid for and allowing them to do more with less—without sacrificing performance,” said Darren Thomas, vice president, Storage Business Unit, Micron Technology, Inc. “At Micron, we consider the impact of every workload, application and environment as we design the technology, products and systems that allow our customers to deploy applications faster and scale without limits.”
It’s estimated that the total amount of digital data created worldwide will reach 163 zettabytes per year by 2025. As enterprises move toward a scale-out data center, they need a way to unleash more potential out of their data in a flexible architecture that can integrate with their growing storage and compute needs. Simply replacing old storage without also modernizing the interfaces, protocols and networks only shifts the bottleneck elsewhere within the system. Micron has coupled flash storage with PCIe NVMe in a platform using converged a NVMeoF infrastructure that performs like server-based storage with the ability to scale at near linear performance rates.
Be Revolutionary with Micron SolidScale Platform Architecture
Designed to unleash the potential of NVMe SSDs and to mainstream NVMe SSDs, the Micron SolidScale platform allows companies to build a scale-out storage infrastructure that provides all the benefits of a centralized single pool of storage with the performance of local in-server SSDs. The SolidScale platform connects multiple nodes using high-speed RDMA over Converged Ethernet (RoCE) fabric with low-latency software that provides a crafted set of data services – delivering a converged infrastructure that performs like local direct attached storage.
Initially launching for Linux environments, with future generations extending to other software-defined storage (SDS) applications, SolidScale is a dual-purpose design that provides customers with a scalable, high performance block storage SDS architecture. Designed for the most demanding application workloads including big data and analytics, database acceleration and high-performance computing, among others, it can also be used as a foundational NVMe-over fabric infrastructure for next-generation data centers, forming the backbone for multi-faceted file systems.
Delivering breakthrough performance, latency and workload optimized capacity, the SolidScale architecture stands up in a 2U node configuration in a 24U server rack enclosure. Key features of the new architecture include:
- Flexible Infrastructure: The logical volume feature of the SolidScale platform provides flexibility to create and manage a single, centralized pool of storage that allows customers to create right size volumes for each server’s data repository.
- Optimized Performance: The speed of Micron NVMe SSDs coupled with high-bandwidth Mellanox fabric delivers performance that scales by adding an average of five microseconds of additional latency to an application’s data path when compared to a local in-server NVMe. Micron SolidScale architecture is expected to reduce end-to-end latency under 200 microseconds. Preliminary tests of the Micron SolidScale platform measured over 10.9M IOPS with only three 2U SolidScale nodes.
- Simple Manageability: The Web-based management interface of the SolidScale platform provides a simple, graphical setup and configuration for key data services.
- Seamless Scalability: Micron SolidScale architecture enables customers to easily scale storage capacity with, or independently from, compute; in addition, performance scales efficiently as more nodes are added.
- Breakthrough Data Center Efficiency: The SolidScale architecture pools the available storage together, providing a platform that can either do the same work with fewer servers or more work in the same number of servers. Overall, this allows compute servers to be thinner, allowing storage to scale independently of compute.
“NVMeoF is a much faster way of connecting to the CPU by using the high-speed interconnects of the RoCE fabric, making this architecture ideal for a range of low-latency, robust data needs spanning real time data analytics, high performance computing and hyperscale database use cases,” said Laura DuBois, group vice president for IDC’s Enterprise Storage, Server and System Infrastructure Software research. “Micron is taking an early mover position by pushing the envelope of software and hardware to enable low latency, reduced costs and high performance to tackle data intensive workloads.”
“Faster storage needs faster networks – in terms of bandwidth, latency, and advanced protocols like NVMe over Fabrics” said Kevin Deierling, vice president of marketing at Mellanox Technologies. “We are proud to connect Micron’s innovative SolidScale solution with our end-to-end 100G Ethernet RoCE networking solutions. The combination of Spectrum™ switches, ConnectX-4® adapters and LinkX™ cables enables SolidScale to maximize performance and total infrastructure efficiency.”
The Micron SolidScale architecture is currently available to key Micron customers and partners to test their own application workloads within existing data center environments. Based on customer validation & testing of the architecture, volume production of the Micron SolidScale platform is expected to begin in early 2018. For customers interested in participating in the SolidScale early access program or OEM companies interested in partnering with Micron to extend SolidScale across their hardware platforms, visit www.micron.com/solidscale.
Or, stay with me here, we
Or, stay with me here, we could rewrite our OS, filesystem, and I/O subsystems to not be optimized for 3600 RPM spinning rust.
Just putting that out there.
Which filesystem in active
Which filesystem in active development can be said to be optimized for HDDs?
F2FS was written to be optimized for solid state storage and yet is frequently outperformed by Ext4, which makes concessions to be backwards compatible with Ext2, a filesystem introduced 20 years before F2FS.
F2FS is designed for SD cards
F2FS is designed for SD cards and eMMC devices. It’s not meant for faster flash devices.
Isn’t latency of 3D XPoint
Isn’t latency of 3D XPoint around 10us and 100us for NAND? 10us overhead plus the time to traverse fiber doesn’t feel cheap at all.
The connected systems would
The connected systems would likely be within the same building or at least the same region, and adding 10% overhead per IO for flash isn't bad when you consider it not being possible to effectively get that much flash / IOPS capability from a single machine without a significant amount of dedicated CPU resources.
XPoint caches would ideally be in the clients, not the server.