Skip to content

3.2 Massively Parallel Processing (MPP)

In the landscape of high-performance computing during the 1980s, a paradigm shift occurred, moving away from the monolithic vector supercomputers towards Massively Parallel Processing (MPP). The MPP philosophy was that computational power could be scaled by using a large number of simple processors, rather than a single, complex processor.1 This section examines the rise of MPP through a case study of a key example: the Connection Machine.

The development of the Connection Machine at Thinking Machines Corporation (TMC), co-founded by W. Daniel “Danny” Hillis in 1983, differed from the prevailing von Neumann architecture.2 Hillis’s doctoral research at MIT formed the basis for a machine designed for data-parallel computation.1

Defining Data Parallelism

The data parallel model is a programming paradigm in which parallelism is achieved by applying the same operation simultaneously to all elements of a large dataset. Instead of a single processor iterating through the data, the model assigns a simple processing element to each data point, allowing for parallel execution.3 This approach is particularly well-suited for problems with inherent data regularity, such as image processing, scientific simulations, and neural network training.4

To support this model, TMC developed specialized programming languages, including C* and *Lisp, which provided high-level constructs for expressing data-parallel operations, abstracting the architectural complexity from the programmer.1

Architectural Evolution: The Connection Machine Series

Section titled “Architectural Evolution: The Connection Machine Series”

The Connection Machine series underwent a significant architectural evolution, reflecting both the maturation of the MPP concept and the changing economics of the microprocessor market.

FeatureCM-1 (1986)CM-2 (1987)CM-5 (1991)
ArchitectureSIMDSIMDMIMD (with SIMD simulation)
ProcessorsUp to 65,536 custom 1-bit processorsUp to 65,536 custom 1-bit processorsUp to 2,048 SPARC RISC processors
Floating PointNoneWeitek 3132 FPU per 32 processorsIntegrated with SPARC processors
Memory4 Kbits per processor64 Kbits per processorUp to 128 MB per processor
Interconnect12-dimensional hypercube12-dimensional hypercube”Fat Tree” network
Peak PerformanceN/A2.5 GFLOPS (with FPUs)131 GFLOPS (1024-node system, 1993)

Sources: 1, 5, 3

The initial models, the CM-1 and CM-2, were an embodiment of Hillis’s original vision. Both machines utilized the same cubic casing (approximately 1.8 meters per side) and shared fundamental architectural principles.6

The CM-1 and CM-2 featured several innovative architectural elements:6

  • Bit-Serial Processing Elements: Each custom VLSI chip (using 2-micron CMOS) housed 16 1-bit processor cells, requiring 4,096 chips for a full 65,536-processor system.6
  • Dual Communication Networks:6
    • 12-Dimensional Hypercube Router: Provided general-purpose packet-switched messaging for irregular communication patterns (latency ~700 cycles). Included hardware message combining for reduction operations.6
    • NEWS Grid (North, East, West, South): A dedicated 2D mesh for nearest-neighbor communication, approximately 6 times faster than the router for regular data patterns.6
  • Virtual Processor Model: Programmers could define data structures with more virtual processors than physical processors. The VP-ratio determined how many virtual processors each physical processor simulated, allowing programs to scale independently of hardware configuration.6

Their key architectural features included:

The Connection Machine CM-2, a black cube with arrays of blinking red LEDs.
The ‘cube-of-cubes’ design of the Connection Machine. Image Credit: Computer History Museum
Architecture of CM-1 and CM-2
Architecture of CM-1 and CM-2. Image Credit: Greg Faust, Mike Gibson and Sal Valente
  • Massive Parallelism: A full system contained 65,536 simple, bit-serial processors.
  • SIMD Execution: A central sequencer broadcast a single instruction to all processors, which executed it synchronously on their local data. This model was highly efficient for uniform operations across large datasets.
  • Hypercube Interconnect: The processors were connected in a 12-dimensional hypercube topology. This network provided high-bandwidth, low-latency communication, with a maximum hop distance of only 12 between any two nodes.7
  • Floating-Point Enhancement (CM-2): The CM-2 addressed the CM-1’s computational limitations by integrating Weitek 32-bit floating-point accelerators (FPAs).56 The 1-bit processors handled control, memory access, and data routing, while feeding operands to the FPAs for high-speed arithmetic. This hybrid approach achieved 2.5 GigaFLOPS for 64-bit matrix multiplication and up to 5 GigaFLOPS for dot products.6
  • Memory Expansion (CM-2): Memory per processor increased from 4 Kilobits (CM-1) to 64-256 Kilobits (CM-2), supporting total system memory of 512 MB.6

The physical design was a cube composed of smaller cubes, with LEDs indicating processor activity.1

A diagram showing the complex, multi-layered structure of a 12-dimensional hypercube network.
The 12-dimensional hypercube interconnect of the Connection Machine. Image Credit: Tamiko Thiel

CM-5: Shift to MIMD and Commodity Processors

Section titled “CM-5: Shift to MIMD and Commodity Processors”

The CM-5, released in 1991, marked a strategic pivot in response to the rapid performance gains of commodity RISC microprocessors.8 TMC marketed this as a “Universal Architecture,” capable of running both data-parallel and message-passing applications efficiently.9

The CM-5 introduced several groundbreaking architectural features:9

  • Processing Nodes: Each node contained:
    • 32 MHz Sun SPARC processor (~22 MIPS) for control and scalar operations9
    • Four custom vector processing units (VUs) operating at 32 MHz, providing 128 MFLOPS peak per node9
    • 32-128 MB of DRAM with 640 MB/sec aggregate memory bandwidth9
  • Tripartite Network Architecture:9
    • Data Network: A 4-ary fat-tree topology with 20 MB/sec leaf bandwidth. Used randomized routing to prevent hot spots and provided scalable bisection bandwidth.9
    • Control Network: A binary tree supporting broadcast, reduction, parallel prefix (scan), and barrier synchronization operations in hardware, enabling “Synchronized MIMD” execution.9
    • Diagnostic Network: Provided back-door access for system monitoring and fault isolation.9
  • Scalable Disk Array (SDA): Storage nodes connected directly to the data network as first-class citizens, utilizing RAID 3 and delivering sustained transfer rates exceeding 100 MB/sec.9
The Connection Machine CM-5, with its characteristic tall, black cabinets and glowing red LED panels.
The Connection Machine CM-5. Image Credit: MIT CSAIL
Block diagram of CM5
Block diagram of CM5. Image Credit: Scott Pakin
Processor node diagram for CM5
Processor node diagram for CM5. Image Credit: Greg Faust, Mike Gibson and Sal Valente
  • MIMD Architecture with Data-Parallel Support: The custom bit-serial processors were replaced with hundreds or thousands of standard Sun SPARC RISC processors. Each processor could execute an independent instruction stream, making the system a MIMD machine, yet it could efficiently simulate SIMD behavior for data-parallel codes through the Control Network.59
  • Software Ecosystem: The system ran CMost (a SunOS variant) and supported multiple programming models:9
    • CM Fortran and C* for data parallelism
    • CMMD library for explicit message passing
    • Active Messages (developed at UC Berkeley), which reduced communication latency from milliseconds to microseconds9
  • Performance Achievement: A 1,024-node CM-5 at Los Alamos National Laboratory achieved 59.7 GFLOPS on the Linpack benchmark, ranking it #1 on the June 1993 TOP500 list.9

While Thinking Machines Corporation ultimately filed for bankruptcy in 1994, the Connection Machine series had a significant impact on high-performance computing.

  • Demonstration of MPP Viability: The CM series demonstrated that massively parallel architectures could achieve performance comparable or superior to traditional vector supercomputers for certain applications. A CM-5 was ranked the world’s fastest computer in 1993.1
  • Popularizing Data Parallelism: It popularized the data-parallel programming model, which remains a key concept in modern parallel computing, particularly in the context of GPUs.
  • Influence on Interconnects: The hypercube and fat tree networks led to research and development in high-performance interconnects, a critical component of all modern supercomputers.
  • Scientific Applications: CM-2 and CM-5 systems were used to advance research in fields such as quantum chromodynamics, oil reservoir simulation, and molecular dynamics.10

The concepts developed by TMC provided a foundation for subsequent supercomputer designs, demonstrating the viability of large-scale parallelism.

  1. Connection Machine - Wikipedia, accessed October 2, 2025, https://en.wikipedia.org/wiki/Connection_Machine 2 3 4 5 6

  2. Thinking Machines Corporation - Wikipedia, accessed October 2, 2025, https://en.wikipedia.org/wiki/Thinking_Machines_Corporation

  3. Connection Machine® Model CM-2 Technical Summary - Bitsavers.org, accessed October 2, 2025, https://bitsavers.org/pdf/thinkingMachines/CM2/HA87-4_Connection_Machine_Model_CM-2_Technical_Summary_Apr1987.pdf 2

  4. The Connection Machine (CM-2) - An Introduction - Carolyn JC …, accessed October 2, 2025, https://spl.cde.state.co.us/artemis/ucbserials/ucb51110internet/1992/ucb51110615internet.pdf

  5. Connection Machine - Chessprogramming wiki, accessed October 2, 2025, https://www.chessprogramming.org/Connection_Machine 2 3

  6. Architecture and applications of the Connection Machine - cs.wisc.edu, accessed November 18, 2025, https://pages.cs.wisc.edu/~markhill/restricted/computer88_cm2.pdf 2 3 4 5 6 7 8 9 10

  7. “The Design of the Connection Machine” - Article in DesignIssues Journal, Tamiko Thiel. Artificial intelligence parallel programming supercomputer design., accessed October 2, 2025, https://www.tamikothiel.com/theory/cm_txts/index.html

  8. Commodity computing - Wikipedia, accessed October 2, 2025, https://en.wikipedia.org/wiki/Commodity_computing

  9. Connection Machine CM-5 Technical Summary - MIT CSAIL, accessed November 19, 2025, https://people.csail.mit.edu/bradley/cm5docs/nov06/ConnectionMachineCM-5TechnicalSummary1993.pdf 2 3 4 5 6 7 8 9 10 11 12 13 14

  10. CM-2 | PSC - Pittsburgh Supercomputing Center, accessed October 2, 2025, https://www.psc.edu/resources/cm-2/