Summary
The UK’s national supercomputer, Archer2, has fallen to 62nd place globally, leading to concerns about the country’s supercomputing investment.
The new Labour government has shelved plans for a new exascale supercomputer, potentially hindering scientific and technological advancements.
What is the use of a singular machine like this? Isn’t it far better to have 50 less capable machines at different institutions. At $20 million each they’d still be impressive.
(I know it’s not really a singular machine, but the question stands).
Current modern supercomputers are actually a mesh of relatively lower spec machines, not a single “computer”, per se. The cost of these isn’t the hardware, it’s the low-latency interconnects and writing the software that can carry out jobs in a massively parallel way.
Did the brackets at the end of my message make my words invisible? You’ve told me something I already stated.
My question is “what is the purpose of a single cluster like this?”.
The main benefit I think is massive scalability. For instance, DOE scientists at Argonne National Laboratory are working on training a language model for scientific uses. This isn’t something you can do on even 10s of GPUs for a few hours, like is common for jobs run in university clusters and similar. They’re doing this by scaling up to use a large portion of ALCF Aurora, which is an Exascale supercomputer.
Basically, for certain problems you either need both the ability to run jobs on lots of hardware and the ability to run them for long (but not too long to limit other labs’ work) periods of time. Big clusters like Aurora are helpful for that.