World’s Fastest Supercomputer Will be Powered by AMD EPYC, Radeon Instinct

Cray’s AMD-Powered Frontier Supercomputer Will Hit 1.5 Exaflops
The world’s fastest supercomputer is set to come online at the Oak Ridge National Laboratory in 2021, and it will be powered exclusively by AMD. The U.S. Department of Energy, along with supercomputer veteran Cray, announced today that the Frontier exascale supercomputer will utilize AMD EPYC CPUs and Radeon Instinct GPUs, giving it an estimated peak performance of 1.5 exaflops once fully online.
The specific generation of EPYC and Instinct parts was not disclosed, although AMD CEO Lisa Su stated that both components will be customized for the system. Frontier is expected to cost around $600 million and will be faster than the world’s top 160 currently operational supercomputers combined.
These upcoming supercomputers are able to reach new heights in processing power thanks to the combination of CPU and GPU hardware, and AMD’s Infinity Fabric gives the company an advantage in providing fast, low latency connections between the company’s own hardware components. Using a future iteration of Infinity Fabric that has yet to be fully detailed, one EPYC processor will be connected to four Radeon Instinct GPUs in each Frontier node. Cray is also using a custom version of AMD’s ROCm programming environment to fully utilize all of the hardware at hand.
Frontier’s 1.5 exaflops will be used to simulate and model complex systems involving weather, sub-atomic structures, genomics, physics, and other important fields. The system will replace Oak Ridge’s existing supercomputer, Summit, which was launched in 2018 and features IBM Power9 processors with NVIDIA Volta GPUs. Summit has a peak performance of 200 petaflops, which would make Frontier over 7 times faster if estimated performance levels are achieved.
The announcement that Frontier will use both AMD CPUs and GPUs follows shortly on the news that the Aurora supercomputer, also built by Cray and scheduled to come online in 2021 at the Argonne National Laboratory, will use both Intel Xeon Scalable CPUs and Xe GPUs to achieve 1 exaflop of peak performance. This marks a shift from previous record-holding supercomputers that used mixed components from different manufactures, often featuring NVIDIA GPUs paired with Intel or IBM CPUs. The advantage in terms of fast, low latency interconnects that can be achieved by sticking with a complete CPU-GPU package from the same provider means that NVIDIA, once virtually everywhere in the supercomputer GPU space, is thus far absent from this new class of exascale systems.
Frontier Performance Explained
Now that supercomputer are reaching the exaflop range, it’s even harder to keep things in perspective. AMD’s press kit for today’s announcement offers some more understandable equivalent values for Frontier’s impressive specifications.
- Frontier’s network bandwidth is fast enough to download 100,000 HD movies per second
- If all 7.7 billion people on earth each completed one calculation per second, it would take over 6 years to do what Frontier can do in one second
- Frontier’s performance will be greater than the top 160 currently deployed supercomputers combined
- The Frontier system will utilize more than 90 miles of cables, enough to span the distance from New York to Philadelphia
I bet there will be questions asked at nVidia HQ why they failed to land such lucrative contract. After all one thing that [any] corporation doesn’t tolerate is when somebody steals their money. In this instance steals means using competitor product. Well, AMD is really on the roll.
Absolutely mind-boggling numbers and… scale.
Nvidia doesn’t really have a cpu of their own. If they pair AMD or intel cpus with Nvidia GPUs, they only get pci express, not nvlink. They are already building an intel based machine at another laboratory. To go with nvidia, they would probably have needed to pair it with Power processors. AMDs solution has some huge advantages over the competition with massive core count and massive IO bandwidth. They don’t need any of the normal IO connections on a compute blade and there have been some rumors that they added an extra link for slow speed IO (like management controller) such that all of the high speed links would be available. They would have 4 links for the gpus and another 4 full links for connection to the node to node fabric. Also, they may actually be planning on using Zen 3 for this build; which could be even larger (more cores and/or more threads, etc).