And here we go again: SPLASH 2011 has started with its first day of workshops.
I attended the Transitioning to Multicore workshop.
For me the most interesting talks were two empirical studies presented by Fernando Castor. The first talk was about a controlled experiment with students using coarse-grained locks (or actually MVars) and STM in Haskell. While such experiments are usually biased in some way or another, he presented an interesting set of anecdotes along with the actual results. Notably, he observed similar mistakes done by the students as we observed as part of our Multicore Programming course. It seems to be hard to define the right granularity of atomicity, especially when an all-inclusive atomic section is not applicable. His second talk was about analyzing the use of concurrency constructs in open source software. While this was still very early work, it shows how gradual the adoption of libraries like java.util.concurrent is in such software projects.
The other presentations included approaches to minimize conflicts in transactional systems, automated support for memorization on top of STM, and how Intel’s TBB can be used to express parallel pipelined programs with a focus on which kind of pipelines can be easily expressed.
For me, the workshop ended with a panel discussion featuring Doug Lea, Matt Sottlie, Suresh Srinivas, and Tucker Taft. The most interesting point made was that one of the goals should be to facilitate all kind of different flavors of parallel programming (of course! 😉) Matt pointed out in his introduction that parallel programming models are all good and well, and that being able to spilt up a problem in threads is important, but that the memory wall is often not discussed and that feeding enough data to the threads to keep them busy is becoming increasingly harder. Another point made in different variations is the importance of tools, their accessibility to developers, and their integration with runtimes and hardware. Doug mentioned also that the hardware vendors are making life harder for everyone without apparent reason by not using standards for instance to number cores. Especially since they do not provide the mechanisms to query the cache design properties and core arrangements that inhibits portable and reliable performance for mechanism like work-stealing schedulers.