01 February 2012
What Will it Take to Get a True HPC Cloud?
In last month’s eNewsletter, Floating HPC in the Cloud, we compared traditional cloud capabilities to those of an HPC resource, and found that delivering true HPC in the cloud could be a difficult proposition. To be successful, HPC in the cloud needs to deliver the capabilities and performance to which HPC users are accustomed.
One important and often misunderstood issue regarding performance is scalability. There is a general misconception that adding more servers to an HPC problem automatically increases performance. In HPC, scalability is loosely defined by the question, "As processors (cores) are added, how much faster will the program run?" A highly scalable program can utilize many cores, while a less scalable program will show no speed increase when more cores are added. Thus, scalability is a function of the program, and is well described by Amdahl's Law. There are, however, machine aspects that can contribute to scalability via Amdahl's law. Simply put, the more things have to wait for data, the worse the scalability. (For those that are familiar with Amdahl's law, this amounts to increasing the sequential portion of the program.)
In HPC, the goal is to speed up applications by keeping resources as busy as possible. If resources are waiting, then utilization is not optimal and adding more resources may actually make things worse. As stated, scalability is a function of the program. Thus, on one end of the spectrum programs can be highly scalable, or "embarrassingly parallel", and on the other end programs can be difficult to scale, which we’ll call "interconnect sensitive."
An HPC cluster can be built using many different types of hardware. In general, the better the connection between cores (the interconnect between server nodes), the better interconnect-sensitive programs will run. If a highly scalable program (i.e., image rendering) is run on a cluster with Gigabit Ethernet and also on a cluster with InfiniBand, the scalability and performance would be almost identical (all other things being equal). If, however, an interconnect-sensitive program (i.e., weather modeling) were run on the same two clusters, the scalability on the Gigabit Ethernet cluster would be much less than on the InfiniBand cluster, and the performance on the InfiniBand cluster would be much better.
Because underlying hardware is important to scalability for some programs, maximizing certain hardware aspects can dramatically help improve application performance. Indeed, InfiniBand goes to great lengths to keep the application as close to the "wires" as possible. Most traditional clouds use either Gigabit or 10-Gigabit Ethernet. HPC instances in these clouds will work very well for embarrassingly parallel programs, but may struggle when they require a better interconnect. Scalability, and hence performance, will suffer. A true HPC cloud needs to offer a high performance interconnect that will not limit application scalability.
An HPC cloud also needs to keep users as close as possible to the hardware, a requirement that runs counter to the virtualization layer that is used on all standard clouds. Virtualization provides great flexibility to users, but is designed to keep them from touching the real hardware. As implemented, this requirement may limit the flexibility of a true HPC cloud. That said, there is a way to provide high performance and cloud flexibility to HPC applications.
The key to an HPC cloud is dynamic provisioning. A traditional cluster usually has a fixed Operating System (OS) on all compute servers. A user program must conform to this specification or it may not run. Like a standard cloud, an HPC cloud should allow the user to pick and choose (or even design) the OS environment for the computing servers, and this can be done though dynamic provisioning, where all compute servers are bare-metal provisioned by the resource scheduler. In essence, the compute nodes are rebuilt each time a program is executed.
While dynamic provisioning may seem time consuming and inefficient, a few things should be taken into consideration. Firstly, most HPC applications run for hours, days, or even weeks. Giving away a small chunk of run-time at the start is a small price to pay for a flexible cloud-like environment. Secondly, and perhaps more importantly, there are provisioning methods that do not require the hard drive on each worker node to be re-imaged, thus reducing the time required to provision the node.
Using options such as RAM-based disks, NFS, and other standard *NIX tools, nodes can be easily provisioned with unique OS environments without touching any of the node hard drives (should they even exist). Hard drives on the nodes can still be used for local scratch storage, but all important OS files and directories are loaded by the resource allocator into a RAM disk. One interesting example of this type of tool is the Warewulf Project.
The Warewulf toolset is a freely available package that allows easy creation and manipulation of node images that are then loaded as RAM disk images onto the compute servers. Booting a node is actually very fast and can be easily changed to suit user or application preferences. A flexible commercial provisioning solution is Bright Cluster Manager, which allows easy installation, monitoring, and management of clusters. Bright also offers cloud bursting capability where jobs can be directed to external clouds.
HPC clouds that combine high performance interconnects and dynamic provisioning can offer the most desirable cloud features, such as flexibility, scalability, and software choice, while also maintaining HPC features that deliver expected performance levels.
HPC storage is another key aspect of HPC clouds, which will be addressed next month. While standard clouds offer many storage options, HPC often requires a robust and predictable storage component, which is not offered under many cloud Service Level Agreements (SLAs). During the coming month, you may want to stay connected to the HPC cloud efforts over at HPCTools.com.


