Highlights of HPCC 2010
The 12th IEEE Internal Conference on High Performance Computing and Communications was not the first conference I attended. However, it was the first one where I actually presented a paper in the main research track.
As usual, the conference covered a wide variety of different topics. For me the following presentations were the most interesting, since they discuss problems related to my own research.
GPGPUs are a hot topic and HPCC covered several different aspects. Often it was not the main focus of the presented paper, which was inherently interesting for me, instead it was the problems the authors were facing during their experiments. For instance the presentation on Sparse Matrix Formats Evaluation and Optimization on a GPU highlighted the difficulties of programming systems with such complex memory hierarchies as present in today’s GPGPU systems. Similar, OpenCL: Make Ubiquitous Supercomputing Possible gave an introduction on how to develop applications for GPGPU systems.
The presentation on Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments gave interesting insights in how super computing center operate and what challenges they face. One interesting anecdote was that the power saving functionality of modern CPUs actually causes some unexpected trouble in such large-scale systems. The temperature differences caused by powering down CPUs can cause their dies to crack resulting in destroyed CPUs. Furthermore, the talk also included a projection of how future super computers will look like. Interestingly, this projection says that the tradeoff between power efficiency and sophisticated CPU optimizations for single-core performance will be decided in favor for power efficiency. Thus, they expect future super computers to have dramatically higher number of much simpler cores than what is used today. That means, optimizations like branch prediction and out-of-order execution will most likely not be present on CPU cores for exascale systems to be able to build high performance systems that fit into the practical energy envelope, i.e., are coolable.
Related to cluster computing was the presentation on Enabling GPU and Many-Core Systems in Heterogeneous HPC Environments Using Memory Considerations. Here the memory bandwidth limitations were considered to schedule tasks on heterogeneous clusters. The idea is that tasks that saturate the memory system will slow down the overall systems. Thus, better distribution of tasks, taking the available memory bandwidth into account will lead to better execution times.
The Evaluation of the Task Programming Model in the Parallelization of Wavefront Programs provides the arguments for the intuitive assumption that fine-grained parallelism is inefficient. However, the approach here was based on OpenMP and Intel’s TBB library, thus, there is no automatic solution for problems which are expressed naturally in a very fine-grained way. As far as I remember form the TiC Summer School, X10’s compiler is actually capable to coarsen up fine-grained task parallelism, and thus provides some automatic solution for this problem.
However, my very personal highlight of the conference was moment when I unexpectedly received the Best Student Paper Award for my paper on Insertion Tree Phasers.