3.1 Early Vector Supercomputers

The first era of commercially successful parallel computing was significantly influenced by the work of Seymour Cray. His name became associated with supercomputing, and his design philosophy influenced high-performance computing for nearly two decades.

The Architect: Seymour Cray

Seymour Cray’s career began at Control Data Corporation (CDC), where he established his reputation as an architect of high-speed computers.¹ His designs, like the CDC 1604 and the CDC 6600, were the fastest of their time.² The CDC 6600, released in 1964, is widely considered the first supercomputer.¹

The CDC 1604 computer system with a large central console and tape drives. — The CDC 1604. Credit: Marcel Brown

Two men operating the dual-console of the CDC 6600, with tape drives in the background. — The CDC 6600. Credit: Computer History Museum

Cray’s design philosophy focused on performance, achieved through minimalist designs that utilized the fastest available components.

He left CDC in 1972 to found Cray Research, a company focused on building high-performance computers.²

Case Study: The CDC STAR-100

Before Cray Research released its first system, other companies had attempted to utilize vector processing. The concept was to perform a single operation on an entire array (“vector”) of numbers at once.² The CDC STAR-100, first delivered in 1974, was the world’s first commercially available vector supercomputer.³ Designed by James Thornton at Control Data Corporation, it was engineered specifically for Lawrence Livermore National Laboratory’s nuclear physics simulations.³

Architectural Characteristics

The STAR-100 operated at 25 MHz (40 nanosecond cycle time) and introduced several pioneering features:³

512-Bit “Superword” Architecture: Memory transferred data in 512-bit units (eight 64-bit floating-point numbers simultaneously), providing high bandwidth for vector operations.³
Deeply Pipelined Arithmetic Units: Two independent segmented pipelines handled floating-point operations, with Pipeline 2 also managing all scalar instructions.³
Virtual Memory: The STAR-100 was the first supercomputer to implement virtual memory, using a 48-bit address space with page sizes of 512 words or 65,536 words.³
Distributed I/O Architecture: The system employed satellite “Stations” based on CDC 1700 minicomputers to handle I/O operations, preventing the central processor from being slowed by peripheral devices.³

Architecture Type	Memory-to-Memory (e.g., CDC STAR-100)
Concept	Vector instructions stream data directly from main memory, through the arithmetic units, and back to memory.⁴
Peak Performance	100 MFLOPS (64-bit), 200 MFLOPS (32-bit split-pipe mode)³
Strength	High bandwidth for long, sequential data vectors.
Limitations	1. High Startup Overhead: Vector breakeven point often exceeded 100 elements.⁵ 2. Poor Scalar Performance: Slow scalar unit due to deep pipeline latency.⁴³ 3. Memory Dependency: Performance bottlenecked by memory access patterns.³
Outcome	Often performed worse than contemporary scalar machines on real-world scientific programs, which are a mix of vector and scalar work.⁴

Diagram of the CDC STAR-100 CPU and memory layout. — Architectural diagram of the CDC STAR-100 CPU and memory. Credit: Purcell, 2010⁶

Block diagram of the CDC STAR-100 system architecture. — Architectural diagram of the CDC STAR-100 system. Credit: Purcell, 2010⁶

These early machines demonstrated the importance of balanced performance; a supercomputer must perform well on both scalar and vector operations.⁵

The Cray-1 Architecture (1976)

In 1976, Cray Research released the Cray-1.² It was a balanced architecture that addressed the limitations of earlier vector machines.

The design reflected the principle later formalized as Amdahl’s Law: the speedup of a program is ultimately limited by its sequential, non-parallelizable fraction.⁷

The Cray-1’s design was a balanced architecture, ensuring efficient performance on the scalar portions of a program.

The C-shaped Cray-1 supercomputer with its distinctive cylindrical design and padded bench seating around the base. — The Cray-1 supercomputer with its distinctive C-shaped chassis. Image Credit: Computer History Museum

Key Architectural Innovations:

The Cray-1 operated with a 12.5 nanosecond clock period (80 MHz), achieving 160 MFLOPS peak performance.⁸ Its architecture featured several critical innovations:

High-Speed Scalar Unit: The Cray-1 included one of the fastest scalar processors of its time, with a scalar/vector crossover point of only 2-4 elements (compared to 100+ for the STAR-100).⁸ This ensured high performance on the non-vectorizable parts of any code.⁴
Register Hierarchy: The architecture employed five register sets:⁸
- Eight 64-bit S (Scalar) registers for computational operands
- Eight 24-bit A (Address) registers for memory addressing and loop control
- Eight 64-element V (Vector) registers for vector operations
- Sixty-four 64-bit T registers and sixty-four 24-bit B registers for intermediate buffering
Register-to-Register Architecture: Instead of streaming from main memory, data was loaded into high-speed vector registers, operated on, and written back. This significantly reduced memory traffic.⁵⁹
Twelve Segmented Functional Units: All units could operate concurrently, with varying latencies (2-14 clock periods depending on operation).⁸ The Reciprocal Approximation Unit enabled pipelined division through iterative methods.⁸
16-Way Memory Interleaving: Main memory was organized into 16 independent banks to prevent bank conflicts and sustain one-word-per-cycle throughput.⁸
Vector Chaining: This breakthrough feature allowed results from one functional unit to be directly forwarded to another, creating customized deep pipelines without intermediate memory access.²⁹⁸ This enabled sustained performance approaching theoretical peak (138 MFLOPS sustained, with bursts up to 250 MFLOPS).⁸

Feature	CDC STAR-100	Cray-1
Architecture	Memory-to-Memory	Register-to-Register
Clock Speed	25 MHz (40 ns cycle)	80 MHz (12.5 ns cycle)
Vector Data	Streamed from Main Memory (512-bit units)	Held in 8 Vector Registers (64 elements each)
Memory System	32-way interleaved magnetic core	16-way interleaved bipolar semiconductor
Scalar Speed	Poor (deep pipeline latency)	Excellent (fastest of its era)
Vector Breakeven	100+ elements	2-4 elements
Pipelining	Two deep pipelines	Twelve segmented functional units with chaining
Peak Performance	100 MFLOPS (64-bit), 200 MFLOPS (32-bit)	160 MFLOPS peak, 138 MFLOPS sustained
Virtual Memory	Yes (48-bit address space)	No (physical addressing only)

Block diagram showing the Cray-1 architecture with vector registers, functional units, and memory organization. — Architectural diagram of the Cray-1 showing its register-to-register vector processing design. Image Credit: Chris Fenton via Homebrew Cray-1A

Physical Design and Engineering

The physical design of the Cray-1 was as notable as its internal architecture. Every aspect of its physical form was a solution to an engineering challenge.

C-Shaped Chassis: This shape was a solution to signal propagation delays.¹⁰ To achieve a clock cycle of 12.5 nanoseconds, all wires had to be extremely short. The cylindrical design (8.5 feet wide, 6.5 feet high) minimized the maximum wire length, with 60 miles of internal wiring controlled by strict length rules (multiples of 1 foot, maximum 4 feet).¹¹¹²⁸
Dense, High-Speed Logic: The system was built with 95% of its logic using a single type of custom integrated circuit, providing standardization and reliability.¹³⁸ This density generated four times the heat per cubic inch compared to the CDC 7600.⁸
Freon Cooling System: To dissipate the 115 kW of heat, a novel Freon-based cooling system was developed. Vertical aluminum/stainless steel cooling bars in each column wall conducted heat to a refrigeration unit in the surrounding bench.¹⁰⁸
Reliability: System availability exceeded 98% with mean time between interruption (MTBI) over 100 hours, demonstrating the quality of the engineering.⁸

Cray-1 Specification	Value
Clock Speed	80 MHz (12.5 ns cycle time)
Peak Performance	160 MFLOPS
Memory	Up to 1 million 64-bit words
Power Consumption	115 kW
Cost	$5 to$ 8 million USD ⁵

Impact and Legacy

The Cray-1 was a commercial success, with over 80 systems sold.⁵ It became an important tool for national laboratories and research universities, enabling research in:

Nuclear weapons simulation
Cryptography
Weather forecasting
Computational fluid dynamics¹

Its successors, the Cray X-MP and Y-MP, introduced shared-memory multiprocessing, allowing multiple vector processors to work in parallel on a single problem and increasing performance into the gigaflops range.¹ For more than a decade, Seymour Cray’s design philosophy represented high-performance computing, creating a legacy of custom, high-performance, and balanced architectural design.

References

Cray-1 | computer - Britannica, accessed October 2, 2025, https://www.britannica.com/topic/Cray-1 ↩ ↩² ↩³ ↩⁴
History of supercomputing - Wikipedia, accessed October 2, 2025, https://en.wikipedia.org/wiki/History_of_supercomputing ↩ ↩² ↩³ ↩⁴ ↩⁵
CDC STAR-100 - Wikipedia, accessed November 18, 2025, https://en.wikipedia.org/wiki/CDC_STAR-100 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰
Vector Architectures: Past, Present and Future, accessed October 2, 2025, https://www.cs.cmu.edu/afs/cs/academic/class/15740-f03/public/doc/discussions/uniprocessors/vector/vector-past-present-future-supercomputing98.pdf ↩ ↩² ↩³ ↩⁴
Cray-1 - Wikipedia, accessed October 2, 2025, https://en.wikipedia.org/wiki/Cray-1 ↩ ↩² ↩³ ↩⁴ ↩⁵
The control data STAR-IOO-Performance measurements, accessed October 16, 2025, https://api.semanticscholar.org/CorpusID:43509695 ↩ ↩²
History of computer clusters - Wikipedia, accessed October 2, 2025, https://en.wikipedia.org/wiki/History_of_computer_clusters ↩
The CRAY- 1 Computer System - cs.wisc.edu, accessed November 18, 2025, https://pages.cs.wisc.edu/~markhill/restricted/cacm78_cray1.pdf ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³
I INM MoRY J - Cray simulator, accessed October 2, 2025, https://cray.modularcircuits.com/cray_docs/articles/an_analysis_of_the_cray1_computer.PDF ↩ ↩²
The CRAY-1 Computer System^, accessed October 2, 2025, https://tcm.computerhistory.org/ComputerTimeline/Chap44_cray1_CS2.pdf ↩ ↩²
The Cray-1 Computer System, 1977, accessed October 2, 2025, https://s3data.computerhistory.org/brochures/cray.cray1.1977.102638650.pdf ↩
The Cray-1 Supercomputer - CHM Revolution - Computer History Museum, accessed October 2, 2025, https://www.computerhistory.org/revolution/supercomputers/10/7 ↩
The CRAY- 1 Computer System, accessed October 2, 2025, https://www.cs.auckland.ac.nz/courses/compsci703s1c/archive/2008/resources/Russell.pdf ↩