01 October 2011
Intel's software plans for HPC are nothing new, and that is the point with the Intel MIC architecture.
A new battle between hardware and software in the HPC market has begun. Once a competition between x86 servers and interconnects, the HPC market will soon be asking users to choose one of two methods in their quest for faster HPC performance.
The emergence of the "commodity x86 Linux cluster" opened up the HPC market to the masses, and this is where the battle lines are being drawn. The current market is based on large numbers of multi-core x86 processors and optional General Purpose Graphical Processing Units (GP-GPUs) from AMD and NVidia. The GP-GPU, while not a general solution, has provided astounding price-to-performance gains for certain applications by providing large numbers of parallel processing cores. NVidia has forged this path though both hardware and attention to software — both development and application software. AMD, though their purchase of ATI, is the other vendor in the GP-GPU game. Both companies understand that the dual-use (consumer video and HPC) of GP-GPUs helps make the market financially possible. Many users looking for a performance boost beyond racks of x86 cores all have their eyes on the GP-GPU hardware race.
From an HPC perspective, a GP-GPU is a Single Instruction Multiple Data (SIMD) device. The concept is not new and at one time similar devices were called array processors. One of the difficulties of this technology has been the need to transfer data over the PCIe bus between the video card (SIMD unit) and the main processor. There are several efforts underway to address this issue in both hardware and software. Compiler vendors such as The Portland Group and Pathscale have been making progress in this regard, while the new AMD Fusion hardware has moved the GPU-GPU directly onto the processor die. NVidia's project Devner is almost certainly working on a similar path.
Notably absent from these efforts has been Intel. Many analysts are quick to point out that Intel has no real presence in the GP-GPU market, and short of buying NVidia may fall behind the technology curve. Intel has not been sitting idle, however. In 2009, there was quite a bit of press about the Intel Larrabee as highly parallel and high-end video technology. It was not, however, a SIMD device like a GP-GPU, instead it consists of multiple general purpose x86 cores designed to work in parallel.
Due to video performance issues, the Larrabee product was canceled. The hardware, however, was not. It has resurfaced as the Intel Many Integrated Core (MIC, pronounced "Mike") architecture. The architecture is based squarely on Larrabee design, although Intel does not mention this connection in any official capacity. Each processor is a true x86 core with a 512-bit vector processing unit, able to process 16 single precision floating point numbers at a time. This design is expected to provide four times the performance of a standard SSE unit on most x86 processors.
When the first commercial MIC processor, code-named "Knights Corner," enters production (possibly in 2012), it will have at least 50 CPUs and it will be available on a PCIe card. There are currently a limited number of "Knights Ferry" development versions in use, which are reported to have 32 cores.
The parallel hardware plans of AMD, NVidia, and Intel may seem similar, but there is big difference when it comes to software. Programming GP-GPUs is non-trivial. The two most popular approaches are CUDA (only available on NVidia hardware) and OpenCL (runs on both NVida and AMD hardware). Both of these approaches often require large rewrites of code, which is often limited in functionality to a single server or workstation (i.e., CUDA and OpenCL programs do not easily scale across clusters). As mentioned, compiler vendors are also working on ways to help standard Fortran and C compilers take advantage of new GPU hardware so that the amount of code re-writing is minimized. In most cases, however, GP-GPU creates a software barrier for many users. It is also important to note that many popular open HPC codes have been ported by NVidia for use on their hardware.
Intel, on the other hand, allows users to work with familiar software development tools to program their MIC devices. Since they are using standard x86 cores with an enhanced vector unit, existing parallel codes will, in theory, run "out of the box" on the Intel MIC architecture (re-compilation will be needed, of course). In Intel's parlance, the transition from multi-core (Xeon) to many-core (MIC) will not be as traumatic as GP-GPU re-programming, and the same software tools will work for both environments.
Ultimately, the difference between AMD or NVidia GP-GPU and the Intel MIC may be more about software development than about providing ten or twenty percent more raw floating point operations than your competitor. The future of HPC may depend more on software development choices than on hardware feature sets.


