10.1 Exascale Computing

Defining the Exaflop Era: A Quintillion Calculations in Pursuit of Grand Challenges

Exascale Computing: A computing system capable of performing at least one quintillion (10¹⁸) double-precision floating-point operations per second (exaFLOPS). These systems represent virtual laboratories designed to address “grand challenges”—complex problems of national, economic, and scientific importance that would require years or decades to solve on previous supercomputers, if they could be solved at all.¹²

This immense computational power is not an end in itself but a critical scientific tool.³ The need for exascale systems arises from the demand for higher-fidelity, three-dimensional simulations of complex, multi-physics phenomena.⁴ These systems function as virtual laboratories, facilitating discovery and innovation across a wide range of fields:²

National Security: A primary driver for exascale development is the management of national nuclear stockpiles. High-resolution simulations on machines such as El Capitan and its predecessor, Sierra, enable scientists to accurately predict the performance, safety, and reliability of aging assets without physical testing.²
Energy and Climate: Exascale systems are essential for developing next-generation energy solutions. They are used to design more efficient and safer nuclear reactors, model the stability of the national power grid, create new materials for advanced batteries and solar cells, and run high-resolution climate models to better predict long-term environmental change.²
Medicine and Biology: In healthcare, exascale computing is accelerating discovery. It enables predictive modeling of drug responses for personalized cancer treatments, helps to understand the complex mechanisms of RAS proteins implicated in 40% of cancers, and allows for the automated analysis of millions of patient records to identify optimal treatment strategies.² During the COVID-19 pandemic, these resources were used to model treatment outcomes and understand the virus at a molecular level.²
Fundamental Science: From the cosmic to the subatomic, exascale computers enable researchers to simulate the evolution of the universe, model the complex collisions of atoms and molecules, and investigate the fundamental laws of physics.⁴

Architectures of the Titans: The Heterogeneous Path to an Exaflop

Achieving an exaflop of performance within a manageable power budget necessitated a fundamental departure from simply scaling up traditional CPU-based architectures. The design for modern exascale systems is based on heterogeneity, a hybrid approach that combines a smaller number of powerful central processing units (CPUs) with a large number of parallel accelerators, primarily graphics processing units (GPUs).⁴

The Heterogeneous Advantage: For highly data-parallel portions of scientific codes, GPUs provide significantly better performance per watt compared to CPUs. The CPU orchestrates the serial parts of the code and overall workflow, while offloading computationally intensive parallel kernels to thousands of lightweight GPU cores.⁴⁵

This architectural choice directly addresses the power consumption challenge. For the highly data-parallel portions of scientific codes—where the same operation is applied to large arrays of data—GPUs provide significantly better performance per watt compared to CPUs.⁴ The CPU functions as the “host” or orchestrator, managing the serial parts of the code and the overall workflow, while offloading computationally intensive parallel kernels to the thousands of lightweight cores within the GPUs.⁵ An exascale system is more than just its processors. It is a complex ecosystem of interconnected components designed for high data throughput. Tens of thousands of individual compute nodes, each containing CPUs and GPUs, must be connected by a high-bandwidth, low-latency network. Systems such as Frontier and Aurora use the HPE Slingshot interconnect, which provides 12.8 terabits/second of bandwidth per switch and is arranged in a “Dragonfly” topology to ensure that any two nodes in the system are at most three network “hops” apart.⁶ This intricate connectivity, which requires miles of optical and copper cabling, is essential to prevent data starvation in the processors. The memory and storage subsystems are also of significant scale. For example, Frontier is supported by the 700-petabyte Orion file system, a site-wide storage solution capable of meeting the machine’s high data demands.⁶ These parallel file systems are crucial for managing the large datasets generated by exascale simulations and for supporting techniques such as application checkpointing, where the entire state of a multi-petabyte simulation is saved periodically to protect against system failures.⁵

Case Study: A Tale of Two Titans - Frontier and Aurora

The first two publicly benchmarked exascale systems in the United States—Frontier at Oak Ridge National Laboratory and Aurora at Argonne National Laboratory—serve as important case studies in modern supercomputer design. Although both are built on the same HPE Cray EX platform, they represent different vendor ecosystems and design philosophies, particularly in their choice of processors.

Frontier (OLCF-5): Deployed in 2022, Frontier was the first machine to officially exceed the exaflop barrier on the High Performance LINPACK (HPL) benchmark.¹ Its architecture is based on the AMD ecosystem. Each of its 9,472 nodes contains a single 64-core AMD Epyc “Trento” CPU paired with four AMD Instinct MI250X GPU accelerators.⁶ This 1:4 CPU-to-GPU ratio emphasizes the role of the accelerator in performing the majority of the computation. In total, the system includes over 9 million cores.⁶ Frontier achieved a sustained performance (Rmax) of 1.102 exaFLOPS while consuming approximately 21 megawatts of power. Its most notable achievement, however, was its power efficiency. Upon its debut, it led the Green500 list as the world’s most efficient supercomputer, delivering 62.68 gigaflops per watt, which validated the DOE’s focus on power-constrained design.⁶
Aurora (ALCF): Becoming fully operational in 2023, Aurora is the result of a long-standing collaboration between Argonne, HPE, and Intel.⁷ Its 10,624 nodes each include two Intel Xeon Max Series CPUs and six Intel Max Series GPUs (codenamed “Ponte Vecchio”).⁷ This 2:6 (or 1:3) CPU-to-GPU ratio represents a slightly more balanced approach than that of Frontier. Aurora achieved a sustained performance of 1.012 exaFLOPS, making it the second U.S. system to exceed the exaflop threshold.⁷ However, this performance is associated with a higher energy cost, with a reported power consumption of approximately 39 MW, which highlights the different trade-offs made in processor design and system integration.⁷

The following table provides a direct comparison of these pioneering systems, along with the current world leader, El Capitan. This side-by-side view simplifies their complex specifications into a clear format, highlighting the architectural trade-offs, performance differences, and scale involved. It also underscores the competitive landscape and makes abstract concepts such as performance and power efficiency more concrete.

Metric	Frontier (OLCF-5)	Aurora (ALCF)	El Capitan (LLNL)
Peak Performance (Rpeak)	2.055 exaFLOPS	1.98 exaFLOPS	Not specified
LINPACK Score (Rmax)	1.353 exaFLOPS	1.012 exaFLOPS	1.742 exaFLOPS
Global Ranking (June 2025)	#2	#3	#1
Architecture	HPE Cray EX	HPE Cray EX	HPE Cray EX
CPU	1x AMD Epyc “Trento” 64-Core	2x Intel Xeon Max Series	AMD Epyc
Accelerator	4x AMD Instinct MI250X GPUs	6x Intel Max Series GPUs	AMD Instinct MI300A APUs
Total Nodes	9,472	10,624	Not specified
Total CPU Cores	606,208	21,248	Not specified
Total GPU Cores	8,335,360	63,744	Not specified
Power Consumption	~24.6 MW	~38.7 MW	~30 MW
Power Efficiency	62.68 GFLOPS/watt	Lower than Frontier	Not specified
Cost (est.)	US$600 million	US$500 million	Not specified
Operational Date	2022	2023	2024
Data sourced from ¹

Overcoming the Four Walls of Exascale

The journey to exascale was not a matter of simple incremental engineering. It required a coordinated, decade-long research and development effort, such as the U.S. Department of Energy’s Exascale Computing Project (ECP), to overcome four fundamental obstacles, or “walls.”⁴ The solutions developed have not only enabled these large-scale systems but are also shaping the future of computing.

The Co-Design Imperative: Early projections indicated that a “brute force” approach to building an exaflop machine would consume hundreds of megawatts—an economically and logistically unfeasible figure. This constraint compelled a permanent shift to co-design: a holistic, system-level approach where domain scientists, mathematicians, computer scientists, and hardware vendors collaborate to optimize every aspect of the machine—from application algorithms to silicon.⁴⁸

The significant, non-negotiable constraints of building these machines, particularly the strict limit on power consumption, compelled a permanent shift away from designing computer components in isolation. Early projections indicated that a “brute force” approach to building an exaflop machine would consume hundreds of megawatts, which was an economically and logistically unfeasible figure.⁸ This physical constraint made it impossible to simply build faster processors and expect the software to keep pace. It necessitated a holistic, system-level approach where every aspect of the machine—from the application algorithms to the silicon—was re-evaluated.

This led to the establishment of a new paradigm: co-design.⁴ The ECP created formal co-design centers where domain scientists (application users), applied mathematicians, and computer scientists collaborated with hardware vendors.⁴ Application developers had to reconsider their algorithms to maximize parallelism and minimize data movement. Software developers had to create new runtime systems and libraries (such as MPI and OpenMP) to manage billions of threads on heterogeneous hardware. Hardware vendors, in turn, had to design processors and interconnects with these specific software and application requirements in mind. This closely integrated, collaborative development process was essential for overcoming the four walls:

The Four Walls and Their Co-Design Solutions

Challenge	The Problem	The Solution	Key Technologies
Power Wall	Building an exaflop system within a 20–40 MW power envelope. Brute-force scaling would consume hundreds of megawatts.⁹⁸	Multi-hundred-million-dollar DOE investment in vendor R&D to create highly efficient, low-power processors.⁹	New generation GPU accelerators with superior performance-per-watt; heterogeneous architectures
Memory Wall	Moving data from memory to processor consumes orders of magnitude more time and energy than the actual computation.¹⁰	Architectural innovations to minimize data movement and increase bandwidth.⁹	High-Bandwidth Memory (HBM) stacked directly onto GPU package; sophisticated cache hierarchies
Resilience Wall	In systems with millions of cores and tens of thousands of components, failures are inevitable.⁸	Combination of hardware resilience and sophisticated software fault-tolerance techniques.⁵	Application checkpointing (periodic simulation snapshots); redundant systems; error-correcting codes
Parallelism Wall	Effectively programming machines with millions of cores and billions of concurrent threads.⁸	New programming models and development frameworks explicitly designed for massive parallelism.⁵	Enhanced MPI and OpenMP libraries; new compiler technologies; domain-specific scientific libraries

These four challenges were not solved in isolation. The co-design centers created by the ECP enabled domain scientists, mathematicians, computer scientists, and hardware vendors to work collaboratively, ensuring that solutions at one level (hardware, software, algorithms) were informed by constraints and opportunities at other levels. This integrated approach represents a new methodology for building extreme-scale computing systems.

Exascale computing, therefore, represents more than just a faster supercomputer; it is the culmination of a new, holistic design methodology. This philosophy, born of necessity at the highest levels of computing, provides the blueprint for the hyper-specialized systems that are becoming mainstream, demonstrating that the future of performance lies not in isolated components, but in the synergistic design of the entire computational stack.

References

Exascale computing - Wikipedia, accessed October 9, 2025, https://en.wikipedia.org/wiki/Exascale_computing ↩ ↩² ↩³
Exascale Computing | PNNL, accessed October 9, 2025, https://www.pnnl.gov/explainer-articles/exascale-computing ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Breaking the petaflop barrier - IBM, accessed October 9, 2025, https://www.ibm.com/history/petaflop-barrier ↩
The Exascale Software Portfolio - Science & Technology Review, accessed October 9, 2025, https://str.llnl.gov/past-issues/february-2021/exascale-software-portfolio ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹
Parallel computing - Wikipedia, accessed October 9, 2025, https://en.wikipedia.org/wiki/Parallel_computing ↩ ↩² ↩³ ↩⁴ ↩⁵
Frontier (supercomputer) - Wikipedia, accessed October 9, 2025, https://en.wikipedia.org/wiki/Frontier_(supercomputer) ↩ ↩² ↩³ ↩⁴ ↩⁵
Aurora (supercomputer) - Wikipedia, accessed October 9, 2025, https://en.wikipedia.org/wiki/Aurora_(supercomputer) ↩ ↩² ↩³ ↩⁴
Exascale Challenges | Scientific Computing World, accessed October 9, 2025, https://www.scientific-computing.com/feature/exascale-challenges ↩ ↩² ↩³ ↩⁴ ↩⁵
Exascale Computing’s Four Biggest Challenges and How They Were Overcome, accessed October 9, 2025, https://www.olcf.ornl.gov/2021/10/18/exascale-computings-four-biggest-challenges-and-how-they-were-overcome/ ↩ ↩² ↩³
The End of the Golden Age: Why Domain-Specific Architectures are …, accessed October 9, 2025, https://medium.com/@riaagarwal2512/the-end-of-the-golden-age-why-domain-specific-architectures-are-redefining-computing-083f0b4a4187 ↩