Cross-Language Compiler Benchmarking: Are We Fast Yet?

Research on programming languages is often more fun when we can use our own languages. However, for research on performance optimizations that can be a trap. In the end, we need to argue that what we did is comparable to state-of-the-art language implementations. Ideally, we are able to show that our own little language is not just a research toy, but that it is, at least performance-wise, competitive with for instance Java or JavaScript VMs.

Over the last couple of years, it was always a challenge for me to argue that SOM or SOMns are competitive. There were those 2-3 paragraphs in every paper that never felt quite as strong as they should be. And the main reason was that we don’t really have good benchmarks to compare across languages.

I hope we finally have reasonable benchmarks for exactly that purpose with our Are We Fast Yet? project. To track performance of benchmarks, we also set up a Codespeed site, which shows the various results. The preprint has already been online for a bit, but next week, we are finally going to present the work at the Dynamic Languages Symposium in Amsterdam.

Please find abstract and details below:

Abstract

Comparing the performance of programming languages is difficult because they differ in many aspects including preferred programming abstractions, available frameworks, and their runtime systems. Nonetheless, the question about relative performance comes up repeatedly in the research community, industry, and wider audience of enthusiasts.

This paper presents 14 benchmarks and a novel methodology to assess the compiler effectiveness across language implementations. Using a set of common language abstractions, the benchmarks are implemented in Java, JavaScript, Ruby, Crystal, Newspeak, and Smalltalk. We show that the benchmarks exhibit a wide range of characteristics using language-agnostic metrics. Using four different languages on top of the same compiler, we show that the benchmarks perform similarly and therefore allow for a comparison of compiler effectiveness across languages. Based on anecdotes, we argue that these benchmarks help language implementers to identify performance bugs and optimization potential by comparing to other language implementations.

  • Cross-Language Compiler Benchmarking: Are We Fast Yet? Stefan Marr, Benoit Daloze, Hanspeter Mössenböck; In Proceedings of the 12th Symposium on Dynamic Languages (DLS ’16), ACM, 2016.
  • Paper: HTML, PDF, DOI
  • BibTex: BibSonomy

Language Research with Truffle at the SPLASH’16 Conference

Next weekend starts one of the major conferences of the programming languages research community. The conference hosts many events including our Meta’16 workshop on Metaprogramming, SPLASH-I with research and industry talks, the Dynamic Languages Symposium, and the OOPSLA research track.

This year, the overall program includes 9 talks on Truffle and Graal-related topics. This includes various topics including optimizing high-level metaprogramming, low-level machine code, benchmarking, parallel programming. I posted a full list including abstracts here: Truffle and Graal Presentations @SPLASH’16. Below is an overview and links to the talks:

Sunday, Oct. 30th

AST Specialisation and Partial Evaluation for Easy High-Performance Metaprogramming (PDF)
Chris Seaton, Oracle Labs
Meta’16 workshop 11:30-12:00

Towards Advanced Debugging Support for Actor Languages: Studying Concurrency Bugs in Actor-based Programs (PDF)
Carmen Torres Lopez, Stefan Marr, Hanspeter Moessenboeck, Elisa Gonzalez Boix
Agere’16 workshop 14:10-14:30

Monday, Oct. 31st

Bringing Low-Level Languages to the JVM: Efficient Execution of LLVM IR on Truffle (PDF)
Manuel Rigger, Matthias Grimmer, Christian Wimmer, Thomas Würthinger, Hanspeter Mössenböck
VMIL’16 workshop 15:40-16:05

Tuesday, Nov. 1st

Building Efficient and Highly Run-time Adaptable Virtual Machines (PDF)
Guido Chari, Diego Garbervetsky, Stefan Marr
DLS 13:55-14:20

Optimizing R Language Execution via Aggressive Speculation
Lukas Stadler, Adam Welc, Christian Humer, Mick Jordan
DLS 14:45-15:10

Cross-Language Compiler Benchmarking—Are We Fast Yet? (PDF)
Stefan Marr, Benoit Daloze, Hanspeter Mössenböck
DLS 16:30-16:55

Thursday, Nov. 3rd

GEMs: Shared-memory Parallel Programming for Node.js (DOI)
Daniele Bonetta, Luca Salucci, Stefan Marr, Walter Binder
OOPSLA conference 11:20-11:45

Efficient and Thread-Safe Objects for Dynamically-Typed Languages (PDF)
Benoit Daloze, Stefan Marr, Daniele Bonetta, Hanspeter Mössenböck
OOPSLA conference 13:30-13:55

Truffle and Graal: Fast Programming Languages With Modest Effort
Chris Seaton, Oracle Labs
SPLASH-I 14:20-15:10

Can we get the IDE for free, too?

What do we need for full IDE Integration for Truffle Languages?

With the Truffle language implementation framework, we got a powerful foundation for implementing languages as simple interpreters. In combination with the Graal compiler, Truffle interpreters execute their programs as very efficient native code.

Now that we got just-in-time compilation essentially “for free”, can we get IDE integration for our Truffle languages as well?

In case you wonder, this is inspired by the language server protocol project of Microsoft, RedHat, Eclipse Che, and others. Their goal is to develop a language-agnostic protocol that connects so-called language servers to IDEs. That made me wonder whether we could provide the infrastructure needed for such a language server as part of Truffle.

In the remainder of this post, I'll briefly discuss what IDE features would be desirable as a start, what of that Truffle currently could support, and how far we could got with a language-agnostic API as part of Truffle.

1. Which IDE Features would be desirable?

Generally, I am thinking about features available in IDEs such as Eclipse, Visual Studio, NetBeans, or Pharo. On the one hand, there are tools that help to understand the execution of a program. Typically, this includes debugging support, inspecting of values, but it can also be about profiling to identify performance issues. Such execution-related aspects are covered by Truffle already today. The framework comes with support for a debugger and a profiler. The debugger can be used across Truffle languages for instance in NetBeans or in a web-based experiment of mine.

Features that are not strictly related to execution are however not supported. In the research community, this area is something where Language Workbench projects [1] excel. They often come with their own domain-specific languages to define language grammars, and use transformation or compilation approaches to generate a wide ranges of tools from such specification.

The tools I find most essential for an IDE include:

  • support for highlighting (syntactic and semantic)
  • code browsing, structural representation
  • code completion (incl. signature information, and API documentation)
  • reporting of parsing and compilation errors
  • reporting of potential bugs and code quality issues
  • quick fix functionality
  • refactoring support
Code Completion for SOMns in VS Code

For code completion, as in the figure above, one needs of course language-specific support, ideally taking the current context into account, and adjusting the set of proposals to what is possible at that lexical location.

Go to Definition for SOMns in VS Code

Similarly, for the go to definition example above, it is most useful if the language semantics are taken into account. My prototype currently does not take into account that the receiver of a println in this case is a literal, which would allow it to reduce the set of possible definitions to the one in the String class.

Parsing Error in SOMns in VS Code

Other features are more simple mappings of already available functionality. The error message above is shown on parsing errors, and helps to understand why the parser complains. That type of error is also shown in typical languages for instance on the command line to enable basic programming.

Navigating a file based on its program entities

The other features are clearly more language-specific, as is the last example above, the browsing of a file based on the entities it defines. However, most of these elements can be mapped onto a common set of concepts that can be handled in a language-agnostic way in an IDE.

While the DynSem project might bring all these generation-based features to Truffle languages, I wonder whether we can do more in a bottom-up approach based on the interpreters we already got. Ensō, a self-describing DSL workbench, seems to go the route of interpretation over transformation as well.

2. What does Truffle currently support?

As mentioned already above, Truffle currently focuses on providing a framework geared towards tooling for language execution. This focuses mainly on providing the implementation framework for the languages themselves, but includes also support for language instrumentation that can be used to implement debuggers, profilers, tools for collecting dynamic metrics, coverage tracking, dynamic code reloading, etc.

The framework is based on the idea that AST nodes, i.e., the basic executable elements to which a parser transforms an input program, can be tagged. An instrumentation tool can then act based on these tags and for instance add extra behavior to a program or track its execution. AST nodes are correlated to their lexical representation in a program by so-called SourceSections. A SourceSection encodes the coordinates in the program file.

Unfortunately, this is where the support from the Truffle framework ends. Tooling aspects revolving around the lexical aspects of programs are currently not supported.

3. Can we provide a language-agnostic API to implement tooling focused on lexical aspects?

The most basic aspect that is currently missing in Truffle for any kind of lexical support is mapping any location in a source to a corresponding semantic entity. There are two underlying features that are currently missing for that. First, we would need to actually retain information about code that is not part of a method, i.e., is not part of the ASTs that are built for Truffle. Second, we would need a simple lookup data structure from a location in the source to the corresponding element. To implement for instance code completion, we need to identify the code entity located at the current position in the source, as well as its context so that we can propose sensible completion options.

Let's go over the list of things I wanted.

3.1 Support for Highlighting (Syntactic and Semantic)

This needs two things. First, a set of common tags to identify concepts such as keywords, literals, or method definitions. Second, it needs an API to communicate that information during parsing to Truffle so that it is stored along with the Source information for instance, and can be used for highlighting.

To support semantic instead of purely syntactic highlighting, Truffle would furthermore need support for dynamic tags. Currently, Truffle assumes that tags are not going to change after AST creation. For many dynamic languages, this is however too early. Only executing the code will determine whether an operation is for instance a field read or an access to a global.

3.2 Code Browsing, Structural Representation

Communicating structural information might be a bit more challenging in a language-agnostic way. One could got with the choice of a superset of concepts that is common to various languages and then provide an API that records these information on a per-Source basis, which could then be queried from an IDE.

One challenge here, especially from the perspective of editing would be to chose data structures that are easily and efficiently evolvable/updatable during the editing of a source file. Assuming that the editor provides the information about only parts of a source having changed, it should be possible to leverage that.

Note, this also requires a parser that is flexible enough to parse such chunks. This is however something that would be language-specific, especially since Truffle leaves the parsing aspect completely to the languages.

3.3 Code Completion (incl. signature information, and API documentation)

For code completion, one needs a mapping from the source locations to the 'lexically dominating' entities. With that I mean, not necessarily the structure of an AST, but as with the highlighting, a mapping from the source location to the most relevant element from a user's perspective. Assuming we got that for highlighting already, we would need language-specific lookup routines to determine the relevant elements for code completion. And those should then probably also return all the language-specific information about signatures (name and properties of arguments, e.g., types) as well as API documentation.

3.4 Reporting of Parsing and Compilation Errors

Depending on how far one wants to take that, this could be as simple as an API to report one or more parsing or compilation exceptions.

I see two relevant aspects here that should be considered. The first is that the current PolyglotEngine design of Truffle does not actually expose a parse() operation. It is kept sealed off under the hood. Second, depending on the language and the degree of convenience one would want to provide to users, the parser might want to continue parsing after the first error and report multiple issues in one go. This might make the parser much more complex, but for compilation, i.e., structural or typing issues unrelated to the syntax, one might want to report all issues, instead of aborting after the first one. Such features might require very different engineering decisions compared to implementations that abort on the first error, but it would improve the programming experience dramatically.

3.5 Reporting of Potential Bugs and Code Quality Issues

This doesn't seem to be fundamentally different from the previous issue. The question is whether an API for querying such information is something that belongs into the PolyglotEngine, or whether there should be another entry point for such tooling altogether. Since I have a strong dynamic languages background, I'd argue the PolyglotEngine is the right place. I want to execute code to learn more about the behavior of a program. I want to run unit tests to get the best feedback and information (including types and semantic highlighting) about my code. So, I'd say it belongs all in there.

3.6 Quick-Fix Functionality and Refactoring Support

I haven't really experimented with these aspects, but there seems to be a language-specific and a language-agnostic component to it. The language-specific component would be to identify the entities that need to be changed by a quick fix or refactoring, as well as determining the replacement. The actual operation however seems to be fairly language-independent and could be a service provided by a common infrastructure to change Source objects/files.

4. Conclusion

To me it seems that there is huge potential for Truffle to provide more language-agnostic infrastructure to realize standard and perhaps non-standard IDE features by providing additional APIs to be implemented by languages. Getting basic features is something reasonably straight forward and would help anyone using a language that doesn't already have IDE support traditionally.

However, there are also a couple of challenges that might be at odds with Truffle as a framework for languages that are mostly tuned for peak performance. In my own experience, adapting the SOMns parser to provide all the necessary information for highlighting, code completion, go to definition, etc., requires quite a few design decisions that depart from the straight-forward parsing that I did before to just directly construct the AST. On the one hand, I need to retain much more information than before. My Truffle ASTs are very basic and contain only elements relevant for execution. For editing in an IDE however, we want all the declarative elements as well. On the other hand, one probably wants a parser that is incremental, and perhaps works on the chunk that the editor identified as changed. If the parser wasn't designed from the start to work like that, this seems to require quite pervasive changes to the parser. Similarly, one would need a different approach to parsing to continue after a parse error was found. On top of that comes the aspect of storing the desired information in an efficient data structure. Perhaps that is something where persistent data structures would be handy.

While there are challenges and tradeoffs, for language implementers like me, this would be a great thing. I'd love to experiment with my language and still get the benefits of an IDE. Perhaps not exactly for free, but with a reasonable effort. While Language Workbenches provide such features, I personally prefer the bottom up approach. Instead of specifying my language, I'd rather express the one truth of its semantics as an interpreter. In the end, I want to execute programs. So, let's start there.

PS

I'd like to thank Michael L. Van De Vanter for discussions on the topic. The code for my experiments with the language server protocol and SOMns are available on GitHub as part of the SOMns-vscode project. So far, I haven't tried to integrate it with Truffle however.

And I have to admit, I am currently mostly focusing on the MetaConc project, where we aim a little higher and try to go beyond what people come to expect when debugging concurrent applications.

References

  1. Erdweg, S.; van der Storm, T.; Völter, M.; Boersma, M.; Bosman, R.; Cook, W. R.; Gerritsen, A.; Hulshout, A.; Kelly, S.; Loh, A.; Konat, G. D. P.; Molina, P. J.; Palatnik, M.; Pohjonen, R.; Schindler, E.; Schindler, K.; Solmi, R.; Vergu, V. A.; Visser, E.; van der Vlist, K.; Wachsmuth, G. H. & van der Woning, J. (2013), The State of the Art in Language Workbenches, Springer, pp. 197-217.

Writing Papers with Completely Automated Data Processing

One of the first things that I found problematic about paper writing was the manual processing and updating of numbers based on experiments. Ever since my master thesis, this felt like a unnecessary and error prone step.

Since a couple of years, I am using R to do the necessary statistics and data processing on for instance benchmark results. R provides very convenient libraries to produce high quality and beautiful plots.

Over the years, my setup evolved to automate more and more things based on a Latex/ReBench/KnitR tool chain. Now, I finally put together a stripped down template project that combines some of the scripts and documents the setup. The result takes a ReBench data file and generates a PDF that looks like this:

Example PDF The code is available on GitHub: https://github.com/smarr/perf-latex-paper

Adding Debugging Support to a Truffle Language

Beside the great performance after just-in-time compilation, the Truffle Language implementation framework provides a few other highly interesting features to language implementers. One of them is the instrumentation framework, which includes a REPL, profiler, and debugger.

To support debugging, an interpreter needs to provide only a few tags on its AST nodes, instruct the framework to make them instrumentable, and have the debugger tool jar on the classpath. In this post, I’ll only point at the basics, of course for a perfectly working debugger, a language needs to carefully decide which elements should be considered for breakpoints, how values are displayed, are how to evaluate arbitrary code during debugging.

The most relevant aspect is tagging. In order to hide implementation details of the AST from users of a tool, only relevant Truffle nodes are tagged for the use by a specific tool. For debugging support, we need to override the isTaggedWith(cls) method to indicate that all elements, which correspond to a statement at which a debugger could stop, return true for the StatementTag. Based on this information, the debugger can then instrument AST nodes to realize breakpoints, stepping execution, etc. In addition, we also need to make sure that a node has a proper source section attached, but that’s something a parser should already be taking care of.

To instrument nodes, they are essentially wrapped by the framework. To allow Truffle to wrap a node, its class, or one of the super classes, needs to be marked as @Instrumentable. This annotation can be used to generate the necessary wrapper classes and allow wrapping of such nodes. In most cases, this should work directly without requiring any custom code.

Before the tool is going to be useable, a language needs to explicitly declare that it supports the necessary tags by listing them on its subclass of TruffleLanguage using the @ProvidedTags annotation.

And lastly, we need to have the truffle-debug.jar on the classpath. With this, languages that use the mx build system, it should be possible to access the REPL with mx build, which then also allows debugging.

With a little bit more work, one could also adapt my prototype for a web-based debugger interface from SOMns. This prototype is language independent, but requires a few more tags to support syntax highlighting. Perhaps more on that in another post.

Below, you can see a brief demo of the debugger for SOMns: