The future of HPC will finally see its past fade away • The Register



Characteristic High Performance Computing (HPC) has a very different dynamic than the general public. It enables computational classes of strategic importance to nation states and their agencies, and thus it attracts investments and innovations which are to some extent decoupled from market forces.

Sometimes it dominates the mass market, sometimes it leans on it, and with the advent of massive cloud infrastructure, something like a supercomputer can even be built from scratch through a browser, provided you have a high performance credit card.

But the national supercomputer efforts are where technological innovation, perhaps with wider applications in their future, goes forward. The current goal is exascale, computers capable of supporting 1018 Floating point instructions per second (FLOPS) measured by standard benchmarks. Everyone is in the race.

South Korea is aiming for that by 2030 using mostly locally-designed components, while Japan’s national computing lab, Riken, is currently home to the world’s fastest supercomputer – half-exascale – built around A64FX based processors. on Arm from Fujitsu. Europe’s main independent supercomputer project will initially use Arm’s V1 chips, but with a shift over time to home-designed RISC-V processors. Locally designed RISC-V processors are also at the heart of India’s national supercomputer project.

Von Neumann

All of these designs are more or less the same amplifications of the von Neumann architecture, named after John von Neumann who worked on the world’s first versatile programmable electronic computer, the ENIAC of 1945. By force of zero competition, it is was also the fastest supercomputer in the world. He set the model for state-of-the-art state-funded basic research with strategic goals – it was used for artillery calculations and to verify the feasibility of the hydrogen bomb.

In the Von Neumann architecture, a central processing unit retrieves code and data from memory and rewrites the data, via a common address bus, with mass storage and I / O. To date, HPC has greatly increased in power by increasing the speed, throughput and capacity of each of these components, largely thanks to Moore’s Law.

While the 2021 Arm V1 uses techniques such as ultra-high bandwidth memory buses, extensive search and instruction engines, and specialized vector engines to achieve performance approximately 100 billion times faster than ENIAC, the two designs share this common base architecture. It should end.

HPC has traditionally used exotic technologies, as it is easier to invest in the fastest ideas at any cost, given its importance to industry, publicly funded research, and the computing needs of people. open and secret government agencies. This was especially evident in the early days of supercomputing, when specialized semiconductor logic was not suited to the mass market and propelled the best designs like the Cray 1. As plug-and-play circuits took hold. power, the trend towards seas of standard chips running industry standard software has taken over and is now the standard model.

But with the slow demise of Moore’s Law and the development of new classes of computational tasks, new technologies and architectures are again being researched, and once again supercomputing should use them before they happen. become suitable for general use. This is happening thanks to synergies between emerging devices, a sea change in where supercomputers expect to see an increase in performance. One area of ​​long-term interest because it illustrates all of these ideas is that of silicon-memristor hybrid neuromorphic circuits.

Neuromorphic designs are those based on neural processing systems found in nature. AI and machine learning are taking up some of these ideas, including anticipatory learning neural networks often implemented in software on CPU and GPU hardware, and increasingly in custom accelerator circuits. . One of the characteristics of these networks is that the computational requirements for each node in the network can be very light – you don’t need a lot of CPU per node, although you do want a lot of nodes.


Enter the memristor, a circuit component that has been known for decades and changes state by altering the magnetic polarization of a tiny structure within. He did not find much use because he is not very fast; it remembers its powerless state, but common designs of logic and silicon memory have always been much cheaper and faster. Other attributes of memristors, such as the quasi-analog way they can hold a wide range of values, not just 1 or 0, are also difficult to exploit in the way we calculate today.

But if you use layers of memristor based components configured as parts of nodes in a neural network on a chip, interconnected by fast silicon, you can achieve very high densities at very low power – and logic becomes its own memory.

Additionally, the wide range of states of the memristor component makes it inherently good at conserving computational weight, a key aspect of learning networks that stores the importance of a particular signal in overall decision making.

The reason research like this is intensely interesting for supercomputer designers is that while classic Von Neumann systems fall short, AI / ML does just the opposite. For the last 10 years or so, AI / ML has been a unified field rather than a mixture of fiefdoms doing visual recognition, natural language processing, etc., it has started to produce unexpected results, some with applications in numerical analysis. it wouldn’t seem like a good fit so far.

Discount to AI / ML

A notable article by Edinburgh researchers in 2019 took a classic numerical problem that challenged formula analysis and could only be tackled by massive computation, the three-body problem in chaotic orbital mechanics, and trained an AI on a large dataset of known solutions.

The resulting trained network could then not only solve previously unresolved three-body problems, but in some cases do so 100 million times faster than the standard approach. That’s the equivalent of more than two decades of Moore’s Law.

There are many examples in many fields where AI / ML techniques act as accelerators for HPC. In fact, one of the most detailed roadmaps for how the two will blend together over the next decade is AI for science, a report led by the US national laboratory Argonne (which was to have the US’s first exascale computer, Aurora, until Intel could not manufacture the chips). Despite its name, the report is a detailed survey of supercomputer development in most fields, bringing together the work of around a thousand researchers.

He predicts an increasing hybridization between classical supercomputers and AI. With increasing hybridization comes an increased use of workflow as the primary model for creating and managing tasks; instead of bringing all the data together in one place and then leveraging it, cascading operations occur when the data arrives from the edge of the system for use in an overall model.

However, he warns, architectural changes will be needed on the classical side to fully realize the potential. Today’s high-performance memory and storage systems expect workloads with relatively small inputs and large outputs, with predictable, contiguous, and block-based operations. Today’s AI training workloads, on the other hand, must read large data sets, often petabytes, repeatedly and not necessarily in order, for training. AI models will need to be stored and sent to inference engines, which could look like small random operations. This could have a huge impact on performance if not designed properly.

And the need to rethink workload management will increase with specialized AI hardware cooperating with traditional systems to form models that are sent to low-power devices at the edge.

Further, within 10 years, Argonne’s roadmap awaits the latest generation of supercomputers, recognizable descendants of previous designs. They will be highly hybrid, with AI / ML doing much of the system management, model generation, and code generation / optimization, and with quantum computing at millions of qubits – perhaps – providing a helpful assistance. If they are useful, they will be used in simulation, and especially by AIs, who will be able to generate optimized code for a task.

Argonne expects 100 exaFLOPS to be achievable by then, with the first zetta-scale designs – 1,000 exaFLOPS – on the drawing board, although, like Argonne’s director, Professor Rick Stevens , told the conference on the Future of Information and Communication 2020: Quantum Computing and Neuromorphic Components. “

They could come a lot faster, he said. “And anything to do with data will have some intelligence.”

AI will be built into storage to provide useful data abstractions as well as bit-by-bit backups and recoveries, and able to fill gaps in missing data with synthetic data, which would not be without problems.

By now, Stevens predicts, the non-Von Neumann parts of supercomputing will take the lion’s share of the development effort and power, becoming the leading technologies in the field. The unique needs of the supercomputer, it seems, which gave birth to the Von Neumann architecture and gave it a life that will easily see its centenary, also triggered the causes of its demise, and for the first time in l history of modern computing. we can see the shape of things to come after. ®


Leave A Reply

Your email address will not be published.