About the Author

Douglas EadlineDouglas Eadline PhD, is both a practitioner and a chronicler of the Linux Cluster HPC revolution. He has worked with parallel computers since 1988 and is a co-author of the original Beowulf How To document.  Prior to starting and editing the popular http://clustermonkey.net web site in 2005, he served as Editor-in-chief for ClusterWorld Magazine. He is currently Senior HPC Editor for Linux Magazine and a consultant to the HPC industry. Doug holds a Ph.D. in Analytical Chemistry from Lehigh University and has been building, deploying, and using Linux HPC clusters since 1995.

User Rating: / 0
PoorBest 
A blog about making HPC things (kind of) work
There was a recent announcement from Tilera about the availability of their manycore Gx chips. They are touting the slogan "Manycore without Boundaries." The term manycore means, 16, 36, 64, or 100 cores in a square mesh (at least to Tilera). This development is interesting although not pointed at the HPC market.

Tilera has a unique design. Instead of using a single bus to connect the multiple cores (like x86 AMD and Intel processors), Tilera uses a mesh that allows very fast memory movements between cores. Indeed, the L2 cache actually distributed over all the processor mesh forming a virtual L3 cache. Each MIPS derived core is able to run a full operating system on its own or multiple cores can be used to run a symmetrical multi-processing operating system. The chips are clearly aimed at Linux web workloads and are not an x86 compatible.

When ever a new multicore or parallel processor shows up in the market, I always think of HPC. Indeed, the current cluster market got started by using commodity parts for HPC. The Tilera chip is interesting because it has basically has said, "We don't care about Windows compatibility because web servers run Linux." If we don't need x86 compatibility, then we can open up the design envelop to some interesting things. Existing code will need to be recompiled however, but this is not an issue because the standard tool chain and APIs are available.

As far as HPC, I think it faces the multi-core vs multi-node issue. That is, if you problem fits within the cores on a node, then you have multiple programming options. If on the other hand you need to scale over several nodes then your programming options start to look like MPI. At this point you need to start looking at how you want to manage the cores within a node vs the cores on many different nodes. Treating them all as part of an MPI universe is fine, however, the slowest communication path between nodes may be a limiting factor. Using a hybrid programming method (i.e. OpenMP for on nodes and MPI between nodes) can get messy and non-portable. Thus, with any new hardware, there is always the HPC programming issue. And, ultimately, the issue is scalability, which always seems to find boundaries.