The path to exascale is anything but clear, and meeting the 2018 milestone will require solving some big challenges.

This year at SC 2011 (SC11) there were many discussions about exascale computing (10^18 FLoating-point OPerations per Second, or FLOPS), now that we have comfortably passed the petaFLOPS (10^15 FLOPS) mark as noted on the latest Top500 List. At the top of the list is the K Computer installed at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe, Japan. This system achieved an impressive 10.51 petaFLOPS on the Top500 Linpack benchmark using 705,024 SPARC64 processing cores and 1.4 petaBytes of memory. To achieve exascale, systems need to run 100 times faster.

The Top500 List is a ranking of performance on a single linear algebra benchmark (high performance Linpack). While it garners much historical, academic, and marketing interest, the benchmark has little relevance to many real applications. Other performance metrics, such as I/O, are not included in the benchmark, and thus limit the use of the list as a measure of "production-level HPC."

Once the PetaFLOP barrier was broken, many top HPC practitioners set their sights on the exaFLOPS (10^18 FLOPS) milestone by the year 2018. Historically, the Top500 list suggests that a 100x increase in six years will be a bit of stretch. In 2005, the best Top500 computer was IBM BlueGene/L system at the DOE's Lawrence Livermore National Laboratory, with a performance of 280.6 teraFLOPS, which indicates a 35-times speed-up for the K computer. This achievement took six years (2005-2011).

In addition to delivering raw FLOPS, the 2018 goal also includes usable FLOPS, or those that are required for production work. In order to get an idea of what an exaFLOP computer might look like, some estimates were compiled as the basis for a Panasas/SICORP SC11 breakfast event entitled The Road to Exascale Computing. The numbers include a 2014 mid-point milestone as well (GB=gigabytes: 10^9 Bytes, TB=terabytes: 10^12 Bytes, PB=petaBytes: 10^15 Bytes). Note that the "cores" estimate is based on some suggestions that include both CPU/GPU cores or some hybrid of both types.

2014 Target of 200 petaFLOPS

  • Compute: ~40,000,000 cores (over 320 cores per memory module)
  • Main memory: .7-60 PB (125,000 modules*32 GB/module = 4 PB)
  • File system scratch: 21.6-1800 PB (15,000 Disks*7.68 TB/disk = 120 PB)
  • File system bandwidth: 1.2-100TB/sec (33,000 Disks*196 MB/s = 6.67 TB/sec)

2018 Target of 1 exaFLOP (1,000 petaFLOPS)

  • Compute: ~200,000,000 cores (over 1000 cores per memory module)
  • Main memory 3.6-300 PB (156,250 modules*128GB = 20PB)
  • File system scratch: 108-9000 PB (20,000 Disks*29.5 TB/disk = 600PB)
  • File system bandwidth: 12-1000TB/sec (160,000 Disks*384MB/sec = 66 TB/sec)

The immense scale of these systems is almost beyond comprehension. Other challenging issues in constructing them include power, reliability at such large scale, interconnect speed/latency, and software. In one sense, the actual path to exascale computing is unknown at this point because extrapolating today's technology curve does not provide a clear solution to many of these challenges.

Beyond the technological issues, there lurks a bigger question, "Can one trust exascale results?" Generating correct results has always been an issue with HPC because of the scale of systems and research problems. A simple example is a round-off error that can result from poorly designed parallel software. At exascale, where reliability will be a big challenge, there may be a need to devote a significant portion of "cycles" to verifying correctness at run-time. That is, it may no longer be feasible to "check the answers" after they are delivered because there will be so few systems capable of running similar programs. Indeed, it is possible that multiple exascale runs of the same program and data may produce similar, but not exactly the same results.

Ultimately, exascale will lead to new software models that include both robustness and reliability. In many respects a typical MPI program, such as the Top500 benchmark, is a brittle resource. A single failure in any portion of the machine that is running a program can cause the whole program to fail. These failures can be attributed to many different aspects of the machine, which, due to its size, can almost guarantee some form of hardware failure while large programs are running. Traditional methods, such as check-pointing the program, may no longer be possible at exascale (the time required to write checkpoint data may add a prohibitive amount of time to overall execution time). New software methods will be required that may trade efficiency for reliability while running at exascale.

Some researchers believe that the 2018 goal for exascale is too optimistic. The many new challenges facing those who are designing exascale systems for the future are not trivial, and perhaps the goal is beyond the grasp of our current and immediately future technology. Eventually, exascale computing will become a reality, and many of the breakthroughs will find their way into more conventional HPC systems. There is definitely value in striving for such a goal. In particular, those organizations and companies that focus on production FLOPS rather than Top500 FLOPS will be able to bring "lessons learned on the road to exascale" to their HPC users, who will have to be satisfied with mere petaFLOPS in the future.