University of Illinois Center for Supercomputing Research and Development

Over its 9 years of major funding, plus follow-on work by many of its participants, CSRD pioneered many of the shared memory architectural and software technologies upon which all 21st century computation is based.

By the early 1980s, a time of world-wide HPC expansion arrived, including the race with the Japanese 5th generation system targeting innovative parallel applications in AI.

HPC/supercomputing had emerged as a field, commercial supercomputers were in use by industry and labs (but little by academia), and academic architecture and compiler research were expanding.

Many graduate students and post-docs were already contributing to constituent efforts; full time academic professionals were hired, and other faculty cooperated.

A total of up to 125 people were involved at the peak, over the nine years of full CSRD operation[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] The UIUC administration responded to the computing and scientific times.

UIUC President Stanley Ikenberry arranged to have Governor James Thompson directly endow CSRD with $1 million per year to guarantee personnel continuity.

Cedar was based on designing and building a limited amount of innovative hardware, driven by SW that was built on top of emerging parallel applications and compiler technology.

By breaking the tradition of building hardware first and then dealing with SW details later, this codesign approach led to the name Cedar instead of Illiac 5.

Also, KAI was founded in 1979, by three Parafrase veterans who wrote KAP, a new source-source restructurer, (Kuck, Bruce Leasure, and Mike Wolfe).

In sharp contrast, two decades earlier, the Illiac 4 team required years of work with state of the art industry hardware technology leaders to get the system designed and built.

After contracting with Burroughs Corp to build and integrate an all-transistor hardware system, lengthy discussions ensued about the semiconductor memory design (and schedule slips) with subcontractor Texas Instruments' Jack Kilby (IC inventor and later Nobelist), Morris Chang (later TSMC founder) and others.

Many attempts at parallel computing startups arose in the decades following Illiac 4, but nothing achieved success until adequate languages and software were developed in the 1970s and 80s.

[19] These enabled the team to focus on several design topics quickly: The architecture group had a decade of parallel interconnect and memory experience[13] and high-radix shuffle network chosen, so after selecting Alliant as the node manufacturer, custom interfacing hardware was designed in conjunction with Alliant engineers.

Designing, building and integrating the system was then a multi-year effort, including architecture, hardware, compiler, OS and algorithm work.

Because it was built on a vectorizer the first modified version of KAP developed at CSRD lacked some important capabilities necessary for an effective translation for multiprocessors, such as array privatization and parallelization of outer loops.

To identify the missing capabilities and develop the necessary translation algorithms, a collection of Fortran programs from the Perfect Benchmarks[24] was parallelized by hand.

The group was focused on developing a library of parallel algorithms and their associated kernels that mainly govern the performance of large-scale computational science and engineering (CSE) applications.

Parallel algorithms that realize high performance on the Cedar architecture were developed for: In preparing to evaluate candidate hardware building blocks and the final Cedar system, CSRD managers began to assemble a collection of test algorithms; this was described in [20] and later evolved into the Perfect Club.

Several papers were published demonstrating performance enhancement for basic linear algebra algorithms on the Alliant quadrants and Cedar.

The number of iterations in these groups decreases as the execution of the loop progresses in such a way that the load imbalance is reduced relative to the static or dynamic scheduling techniques used at the time.

Even so, it gave rigorous justification for generations of neural network architectures, including deep learning [64] and large language models [65] in wide use in the 2020’s.

While Cybenko’s Universal Approximation Theorem addressed the capabilities of neural-based computing machines, it was silent on the ability of such architectures to effectively learn their parameter values from data.

Cybenko and CSRD colleagues, Sirpa Saarinen and Randall Bramley, subsequently studied the numerical properties of neural networks which are typically trained using stochastic gradient descent and its variants.

They observed that neurons saturate when network parameters are very negative or very positive leading to arbitrarily small gradients which turn result in optimization problems that are numerically poorly conditioned.

The multi-cluster shared memory architecture of Cedar inspired a great deal of library optimization research involving cache locality and data reuse for matrix operations of this type.

To this end, the Perfect Benchmarks [24] provided a set of computational applications, collected from various science domains, which were used to evaluate and drive the study of the Cedar system and its compilers.

Funded by Darpa, the HPC++ project [77][78] was led by Dennis Gannon and Allen Malony and Postdocs Francois Bodin from William Jalby’s group in Rennes and Peter Beckman now at Argonne National Lab.

[79] The parallel algorithm development experience gained by one of the members of the Cedar project (A. Sameh) proved to be of great value in his research activities after leaving UIUC.

However, the advent of systems like Cedar allowed one to consider a compiler-assisted implementation of cache coherence for parallel programs,[81] with minimal and completely local hardware support.

In turn, these initiatives contributed to the use of GPUs for a wide range of computational problems, including neural networks for deep-learning whose mathematical foundation was studied by Cybenko as discussed above.