Skip to content

1.3 Defining Parallel Computing

The physical and architectural limitations inherent in the serial computing model—the power constraints that ended frequency scaling and the performance chokepoints of the von Neumann bottleneck and Memory Wall—necessitated a fundamental shift in computer architecture. This shift was towards parallel computing, a paradigm that has come to define the modern era of high-performance computing and is now ubiquitous in every device from smartphones to supercomputers 1.

In its simplest sense, parallel computing is a type of computation in which many calculations or processes are carried out simultaneously 1. It stands in direct opposition to the serial computing model, which dictates that instructions are executed sequentially, one after another, on a single processor 2.

The core principle of parallel computing is decomposition 2. A large, complex computational problem is broken down into smaller, discrete parts that can be solved concurrently 2. Each of these smaller parts is then assigned to a different processing element, which executes its portion of the algorithm simultaneously with the others 2. Finally, an overall control and coordination mechanism is employed to combine the individual results from each processing element into a final, cohesive solution 2.

For a problem to be suitable for parallel computing, it must exhibit characteristics that allow for this decomposition. It must be divisible into pieces of work that can be solved simultaneously, and the total time to solution must be less with multiple compute resources than with a single one 2.

The Imperative for Parallelism: The End of Frequency Scaling

Section titled “The Imperative for Parallelism: The End of Frequency Scaling”

For several decades, the primary method for increasing computer performance was frequency scaling—simply increasing the clock speed of a single processor to make it execute its serial instruction stream faster. This approach provided a “free lunch” for software developers; their existing programs would automatically run faster on each new generation of hardware without any modification 1.

However, in the early 2000s, this strategy hit an insurmountable physical barrier known as the power wall 3. The power consumed by a processor is directly related to its clock frequency 3. As frequencies climbed into the multiple gigahertz range, power consumption and, consequently, heat generation began to increase dramatically 3. It became technologically and economically infeasible to dissipate the immense amount of heat produced by these high-frequency single-core processors 3.

Faced with this wall, the semiconductor industry pivoted its strategy. Instead of trying to build a single, faster core, manufacturers began using the increasing number of transistors available on a chip (as predicted by Moore’s Law) to place multiple, simpler, and more power-efficient processing cores on a single die 3. This marked the advent of multi-core processors (dual-core, quad-core, etc.) and made parallel computing the dominant and necessary paradigm in all areas of computer architecture, from mobile devices to massive data centers 1.

This shift had a profound impact on the software industry. The free lunch was over. To take advantage of the performance potential of new hardware, software now had to be explicitly designed and written to be parallel. This was not merely a matter of using a new hardware feature; it required a fundamental rethinking of algorithm design and introduced a host of new, complex challenges for programmers. Issues such as task decomposition, load balancing (ensuring all processors have a fair share of the work), communication between processors, and synchronization (coordinating access to shared resources to avoid data corruption) became central concerns in software engineering. This transition from implicit performance gains through hardware speed to explicit performance gains through parallel software design represents one of the most significant paradigm shifts in the history of computing.

Parallel computers are built using several different hardware architectures, which can be broadly classified based on how their processors access memory 2.

In this architecture, multiple processors or cores are contained within a single machine and are all connected to a single, globally accessible main memory 4. Any processor can directly access any memory location. This model simplifies programming because processors can communicate and share data implicitly by reading and writing to the same memory locations. Virtually all modern multi-core laptops, desktops, and smartphones are examples of shared memory parallel computers 2.

Basic shared memory architecture, showing both Uniform memory access (UMA) and Non-uniform memory access (NUMA) models.

Basic shared memory architecture. (a) Uniform memory access (UMA) architecture. (b) Non-uniform memory access (NUMA) architecture. Image Credit: ScienceDirect

This architecture connects multiple independent computers, often called nodes, via a high-speed network 2. Each node contains its own processor(s) and its own private, local memory. A processor in one node cannot directly access the memory of another node. Therefore, communication and data sharing must be handled explicitly by the programmer, typically by sending and receiving messages over the network 2. This model is highly scalable and forms the basis for most large-scale supercomputer clusters and cloud computing infrastructures 4.

A diagram of a distributed memory system, showing multiple nodes with their own CPU and Memory, connected by a network.

A typical distributed memory architecture where each processor has its own private memory, and communication is handled over a network. Image Credit: ResearchGate

The most common architecture for modern high-performance computing (HPC) systems is a hybrid of the shared and distributed memory models 5. These systems are clusters of multiple nodes connected by a network (a distributed memory system). However, each individual node is itself a shared memory parallel computer, containing multiple cores that share access to that node’s local memory 2. This hierarchical design allows programmers to use shared-memory programming techniques within a node and message-passing techniques for communication between nodes, offering a balance of programming convenience and massive scalability.

Hybrid distributed-shared memory architecture showing multiple nodes connected via network, with each node containing multiple CPUs and shared memory.

A hybrid distributed-shared memory architecture where multiple nodes (each with multiple CPUs and shared memory) are connected via a network, combining both shared memory within nodes and distributed memory across nodes. Image Credit: ResearchGate

  1. en.wikipedia.org, accessed September 30, 2025, https://en.wikipedia.org/wiki/Parallel_computing 2 3 4

  2. Introduction to Parallel Computing Tutorial - | HPC @ LLNL, accessed September 30, 2025, https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial 2 3 4 5 6 7 8 9 10 11

  3. The von Neumann Architecture — UndertheCovers - Jonathan Appavoo, accessed September 30, 2025, https://jappavoo.github.io/UndertheCovers/textbook/assembly/vonNeumannArchitecture.html 2 3 4 5

  4. Parallel Computing And Its Modern Uses | HP® Tech Takes, accessed September 30, 2025, https://www.hp.com/us-en/shop/tech-takes/parallel-computing-and-its-modern-uses 2

  5. What is parallel computing? | IBM, accessed September 30, 2025, https://www.ibm.com/think/topics/parallel-computing