Tag Archives: SPLASH

Cross-Language Compiler Benchmarking: Are We Fast Yet?

Research on programming languages is often more fun when we can use our own languages. However, for research on performance optimizations that can be a trap. In the end, we need to argue that what we did is comparable to state-of-the-art language implementations. Ideally, we are able to show that our own little language is not just a research toy, but that it is, at least performance-wise, competitive with for instance Java or JavaScript VMs.

Over the last couple of years, it was always a challenge for me to argue that SOM or SOMns are competitive. There were those 2-3 paragraphs in every paper that never felt quite as strong as they should be. And the main reason was that we don’t really have good benchmarks to compare across languages.

I hope we finally have reasonable benchmarks for exactly that purpose with our Are We Fast Yet? project. To track performance of benchmarks, we also set up a Codespeed site, which shows the various results. The preprint has already been online for a bit, but next week, we are finally going to present the work at the Dynamic Languages Symposium in Amsterdam.

Please find abstract and details below:

Abstract

Comparing the performance of programming languages is difficult because they differ in many aspects including preferred programming abstractions, available frameworks, and their runtime systems. Nonetheless, the question about relative performance comes up repeatedly in the research community, industry, and wider audience of enthusiasts.

This paper presents 14 benchmarks and a novel methodology to assess the compiler effectiveness across language implementations. Using a set of common language abstractions, the benchmarks are implemented in Java, JavaScript, Ruby, Crystal, Newspeak, and Smalltalk. We show that the benchmarks exhibit a wide range of characteristics using language-agnostic metrics. Using four different languages on top of the same compiler, we show that the benchmarks perform similarly and therefore allow for a comparison of compiler effectiveness across languages. Based on anecdotes, we argue that these benchmarks help language implementers to identify performance bugs and optimization potential by comparing to other language implementations.

  • Cross-Language Compiler Benchmarking: Are We Fast Yet? Stefan Marr, Benoit Daloze, Hanspeter Mössenböck; In Proceedings of the 12th Symposium on Dynamic Languages (DLS ’16), ACM, 2016.
  • Paper: HTML, PDF, DOI
  • BibTex: BibSonomy

RACES’12: Public Reviews + Paper Drafts + Voting on Agenda == Workshop 2.0?

[Disclaimer: I am one of the assistants supporting the RACES’12 organizers and a PC member.]

The organizers of RACES’12 experiment with a new approach to organize an academic workshop. They slightly change the meaning of the program committee, the reviewing process, and the decision making on the final agenda for the workshop. The main goals are to elevate some of the restrictions dating back from the pre-internet age as well as giving more power to the workshop attendees and their interests. Thus, the program committee advises the attendees rather than making actual decisions. The full rational is outlined in their Call for Participation.

Most interesting for me is the decision to have signed reviews and to make the reviewed paper drafts as well as the reviews public before the workshop. As mentioned before, the main reason for this approach is to give the attendees a chance to inform themselves, guided by the reviews, and enable them to decide on the workshop’s agenda. Since everyone has different interests and preferences, which is true for a classic program committee members as well, it seems to be a good idea to have the attendees have a say in what they want to spend their time on during the workshop. Consequently, attendees can vote on presentation length for all submissions. They can use the program committee’s assessment but if they disagree can vote independently of it, of course.

Since I was part of the program committee, I got the chance two read and review two interesting papers. Below you’ll find the summaries of my reviews and links to the papers and full reviews.

All in all, I found this approach an interesting one, perhaps how it should be in the age of the internet and Web 2.0. However, it remains to be seen how successful it is in preparing a great workshop. We will see that in October at SPLASH’12.

How FIFO is Your Concurrent FIFO Queue?

Andreas Haas, Christoph Kirsch, Hannes Payer and Michael Lippautz

Haas et al. assess the FIFOness of concurrent queues by assuming zero execution time for all operations. They introduce the metrics element-fairness and operation-fairness for this assessment and show in their argumentation and experiments that queues with relaxed semantics can have properties that could be considered more FIFO than queues with strict semantics.

[read more]

Beyond Macho Parallel Programming?

Paul McKenney

McKenney argues that we need approaches for parallel programming that neither patronize developers nor require so much intimate knowledge that they become impractical for a wide range of applications. He makes the point that other ‘expert-only’ ideas and technologies became mainstream as well, benefiting greatly from advances in tooling and education. Thus, what we consider ‘expert-only’ today might well be part of the mainstream in the future. However, he also argues that these macho approaches as he calls them, guide and shape innovation and are necessary for progress.

[read more]

Workshop on Relaxing Synchronization for Multicore and Manycore Scalability

You got a big multicore, or manycore machine, but do not have a clue of how to actually use it, because your application doesn’t seem to scale naturally? Well, that seems to be a problem many people are facing in our new manycore age. One possible solution might be to accept less precise answers by relaxing synchronization constraints. That could allow us to circumvent Amdahl’s law when Gustafson is out of reach.

If you got ideas in that direction, let us know! We are organizing a workshop at SPLASH’12 called RACES.

Here a quote from the mission statement:

The RACES workshop is a first attempt at bringing a new school of thought to the attention of the computer science community: that of abandoning absolute certainty in favor of scalability and performance when considering parallel computation.
Today, multi core systems are becoming more and more the rule and conventional wisdom has been to scale up software for these systems by reducing synchronization constraints. Amdahl’s law however, implies that even the smallest fraction of inherently sequential code limits scaling. The RACES workshop wants to promote an approach towards scalability for many-core systems by reducing synchronization requirements drastically, possibly to the point of discarding them altogether. This will of course cause us to move from the certain into the merely probable.

Read more on: http://soft.vub.ac.be/races/

OOSPLA 2011 @SPLASH2011, Day 3

The third day started with Brendan Eich’s keynote on JavaScript’s world domination plan. It was a very technical keynote, not very typical I suppose. And he was rushing through his slides with an enormous speed. Good that I have some JavaScript background. Aside all the small things he mentioned, interesting for me is that he seemed to be very interested to get Intel’s RiverTrail approach to data-parallelism into ECMAScript in one or another form. That is kind of contradicting the position I heard so far, that ECMAScript would be done with having WebWorkers as a model for concurrency and parallel programming.

Language Implementation

The first session had two interesting VM talks for me. The first being JIT Compilation Policy for Modern Machines. With the assumption that you cannot get your multicore/manycore machines busy with application threads, they experimented how additional compilation threads can be used to optimize code better. I do not remember the details completely, but I think, after 7 compilation threads they reached a mark where it was not worthwhile anymore to add more threads.

The second talk was on Reducing Trace Selection Footprint for Large-scale Java Applications with no Performance Loss. The goal here is to reduce the number of trace in a tracing JIT that need to be kept around and optimized. Might be something interesting for PyPy and LuaJIT2 to consider.

Parallel and Concurrent Programming

The last session I attended was again on parallel and concurrent programing. The first paper A Simple Abstraction for Complex Concurrent Indexes was a pretty formal one.

The second paper is a pessimistic approach to implement atomic statements meant for systems programming: Composable, Nestable, Pessimistic Atomic Statements. It is not optimistic like STM, and does not use a global locking order, which would need to be determined statically. Instead they annotate the fields with so-called shelters. Shelters build a hierarchy, which describes the necessary parts to synchronize with.

The third paper was also a very interesting one for me: Delegated Isolation. It is an approach again similar to STM in a sense, but it avoids unbounded number of transaction retries. For that, the object graph is essentially partitioned in growing subgraphs that can be processed in parallel. Only when a conflict, a race-condition occurred, subgraphs are merged logically, and the computation is serialized. A neat idea and an interesting use of the object-ownership idea.

The last talk was on: AC: Composable Asynchronous IO for Native Languages. It was a presentation of work done in the context of the Barrelfish manycore OS. The goal was to have an easy to use programming model, that comes close to sequential programming, but has similar performance properties as typical asynchronous APIs in operating systems. The result is a model based on async/finish, that seems to be relatively nice. As I understand it, it is basically syntactic sugar and some library support to wrap typical asynchronous APIs. But it is a model that is purely focused on such request/response asynchrony, and does not handle concurrency/parallism.

And that was basically it. SPLASH is a nice conference when it comes to the content. Not so interesting when it comes to the social “events”, it wasn’t much of an event anyway. Not even the food was notable…

OOSPLA 2011 @SPLASH2011, Day 2

The second day of the technical tracks started with a keynote by Markus Püschel. He is not the typical programming language researcher you meet at OOPSLA, but he does research in automatic optimization of programs. In his keynote, he showed a number of examples how to get the best performance for a given algorithm out of a particular processor architecture. Today’s compilers are still not up to the task, and will probably never be up to it. Given a naïve implementation, hand-optimized C code can have 10x speedup when dependencies are made explicit, and the compiler knows that no aliasing can happen. He was then discussing how that can be approached in an automated way, and was also thinking about what programming languages could do.

Award Papers

Afterwards, I attended the session with the awarded OOPSLA papers. The Hybrid Partial Evaluation talk presented an approach to avoid the typical cost of use of reflection or ‘interpretation’. The presentation of SugarJ: Library-based Syntactic Languages felt like a déjà vu. I did not get where it is different from Helvetica other than that it is for Java. The third paper on Reactive Imperative Programming with Dataflow Constraints was interesting in that it used also memory protection tricks to realize a reactive model in C++. The last presentation: Two for the Price of One: A Model for Parallel and Incremental Computation was very interesting. I have not used incremental computations as far as I am aware of anywhere other than for course work, but bringing it together with parallel programming in a single programming model, gives plenty of opportunities for super-linear speedups.

Parallel and Concurrent Programming

The second session of the day was on parallel and concurrent programming. Kismet: Parallel Speedup Estimates for Sequential Programs tackled the problem to get an idea of what opportunities for parallelism are available in a given program without having to change the used algorithms and approaches to much. For that, it uses data dependency analysis to characterize the critical path on a data-flow level. Since that usually does not give realistic results because of overestimation of parallelizability, they use in addition a hierarchical model of loops and the knowledge of the available hardware parallelism to better predict possible speedups.

The second and the third talk where almost identical in terms of problem and goal. Essentially, they provide the necessary infrastructure to run different variants of sequential implementations in parallel and then chose either the winner in terms of runtime or precision. These approaches are especially interesting if the available algorithms have very different properties for different input or input sizes. For instance, some mathematical algorithms just do not converge to a solution for certain inputs while they are very fast for others.

The last talk of the session discussed Scalable Join Patterns. Join patterns are an old approach to describe synchronization mechanism flexibly and declaratively. The presented work provided a scalable implementation approach that seems to work quite well and when they would use a compilation based approach for the patters, I guess it could be a very feasible and flexible replacement for standard synchronization mechanism provided as libraries.

Panel

Instead of attending the third paper session of the day, I attended the panel on Multicore, Manycore, and Cloud Computing: Is a new Programming Language Paradigm required?. Well, it was entertaining 🙂 Nothing really new, no surprising arguments as far as I recall, but certainly interesting to watch. I think, they also recorded it. So it might be floating around the web soon.