Tag Archives: Virtual Machines

10 Years of Language Implementations

First Stop: VMs, Compilers, and Modularity

In April 2007, I embarked on a long journey. A journey on which I already met a lot of interesting people, learned many fascinating things, and had a lot of fun implementing programming languages. It all started on this day in 2007. At least if I can trust the date of my first commit on CSOM to my SVN server back then. A lot has happened in the last 10 years, and, perhaps mostly for myself, I wanted to recount some of the projects I was involved in.

It all started with me wanting to know more about the low-level things that I kind of avoided during my bachelor studies. I have been programming since a long time, but never really knew how it all actually worked. So, I inscribed in the excellent Virtual Machines (VMs) course, which was taught by Michael Haupt at the time. I also took a course on Software Design, in which I studied Traits.

Why do I mention traits? Well, I had been using PHP since 2000 or so. It was my language of choice. And to understand traits better, I decided the best way would be to implement them for PHP. So, more work on language implementations. I have to admit, the main reason I didn’t just study them in Squeak Smalltalk was because Squeak looked silly and I didn’t like it. I guess, I was just stubborn. And that stubbornness caused me to inflict traits on PHP as part of my first venture into programming language design.

As a result, my traits for PHP were released about 5 years later with PHP 5.4. So, it took a lot of stubbornness… Fun fact: Wikipedia explains traits with a PHP example, perhaps because PHP is one of the few curly-brace languages that is relatively close to the Smalltalk traits design.

Meanwhile, in the VM course, we started to look in detail into a little Smalltalk called SOM (Simple Object Machine). Specifically, we worked with CSOM, the C implementation of SOM. Together with a fellow student, I chose a rather ambitious topic: build a just-in-time (JIT) compiler for SOM. Well, in the end, I think, he did most of the work. And I learned more than I could have imagined. In our final presentation we reported performance gains of 20% to 55%. The JIT compiler itself was a baseline compiler that translated bytecodes one by one to x86 machine code. The only fancy thing it did was to supporting hybrid stack frames, i.e., using essentially the C stack, but still providing a full object representation of the stack as Smalltalk context objects.

This JIT compiler project was a lot of fun, but also a lot of headache… Perhaps not something, I’d generally recommend as a first project. However, after the VM course, and the work on traits, I was really interested to continue and learn more about VMs and modularity, and perhaps also combine it with the hyped aspect-oriented, feature-oriented, and context-oriented programming ideas, which I haven’t taken the time to study yet.

Under the guidance of Michael and Robert Hirschfeld, I started the work on my master thesis, which resulted in a Virtual Machine Architecture Definition Language (VMADL). VMADL combined ideas of feature-oriented and aspect-oriented programming to allow us to build a VM product line: CSOM/PL. It used CSOM, from the VM course, and combined the results of the various student projects. So, one could built a CSOM for instance with native or green threads, with a reference counting GC, or a traditional mark/sweep GC, and so on. It was all based on a common code base of service modules, which were linked together with combiners that used aspects to weave in necessary functionality at points explicitly exposed by the service modules. Since that is all very brief and abstract, the CSOM/PL paper is probably a better place to read up on it.

I guess, that’s enough for today. Since this only covers the first few steps until summer 2008, there is more to come on:

  • supporting all kind of concurrency models on a simple VM
  • performance, performance, and metaprogramming
  • and safe combination of concurrency models

Cross-Language Compiler Benchmarking: Are We Fast Yet?

Research on programming languages is often more fun when we can use our own languages. However, for research on performance optimizations that can be a trap. In the end, we need to argue that what we did is comparable to state-of-the-art language implementations. Ideally, we are able to show that our own little language is not just a research toy, but that it is, at least performance-wise, competitive with for instance Java or JavaScript VMs.

Over the last couple of years, it was always a challenge for me to argue that SOM or SOMns are competitive. There were those 2-3 paragraphs in every paper that never felt quite as strong as they should be. And the main reason was that we don’t really have good benchmarks to compare across languages.

I hope we finally have reasonable benchmarks for exactly that purpose with our Are We Fast Yet? project. To track performance of benchmarks, we also set up a Codespeed site, which shows the various results. The preprint has already been online for a bit, but next week, we are finally going to present the work at the Dynamic Languages Symposium in Amsterdam.

Please find abstract and details below:

Abstract

Comparing the performance of programming languages is difficult because they differ in many aspects including preferred programming abstractions, available frameworks, and their runtime systems. Nonetheless, the question about relative performance comes up repeatedly in the research community, industry, and wider audience of enthusiasts.

This paper presents 14 benchmarks and a novel methodology to assess the compiler effectiveness across language implementations. Using a set of common language abstractions, the benchmarks are implemented in Java, JavaScript, Ruby, Crystal, Newspeak, and Smalltalk. We show that the benchmarks exhibit a wide range of characteristics using language-agnostic metrics. Using four different languages on top of the same compiler, we show that the benchmarks perform similarly and therefore allow for a comparison of compiler effectiveness across languages. Based on anecdotes, we argue that these benchmarks help language implementers to identify performance bugs and optimization potential by comparing to other language implementations.

  • Cross-Language Compiler Benchmarking: Are We Fast Yet? Stefan Marr, Benoit Daloze, Hanspeter Mössenböck; In Proceedings of the 12th Symposium on Dynamic Languages (DLS ’16), ACM, 2016.
  • Paper: HTML, PDF, DOI
  • BibTex: BibSonomy

Are We There Yet? Simple Language-Implementation Techniques for the 21st Century

The first results of my experiments with self-optimizing interpreters was finally published in IEEE Software. It is a brief and very high-level comparison of the Truffle approach with a classic bytecode-based interpreter on top of RPython. If you aren’t familiar with either of these approaches, the article is hopefully a good starting point. The experiments described in it use SOM, a simple Smalltalk.

Since writing things down, the work on the different SOM implementations has continued resulting in better overall performance. This reminds me: thanks again to the communities around PyPy/RPython and Truffle/Graal for their continues support!

The preprint of the paper is available as PDF and HTML version. For the experiments, we also prepared an online appendix with a few more details and made the experimental setup available on GitHub.

Abstract

With the rise of domain-specific languages (DSLs), research in language implementation techniques regains importance. While DSLs can help to manage the domain’s complexity, it is rarely affordable to build highly optimizing compilers or virtual machines, and thus, performance remains an issue. Ideally, one would implement a simple interpreter and still reach acceptable performance levels. RPython and Truffle are two approaches that promise to facilitate language implementation based on simple interpreters, while reaching performance of the same order of magnitude as highly optimizing virtual machines. In this case study, we compare the two approaches to identify commonalities, weaknesses, and areas for further research to improve their utility for language implementations.

  • Are We There Yet? Simple Language Implementation Techniques for the 21st Century.; Stefan Marr, Tobias Pape, Wolfgang De Meuter; IEEE Software 31, no. 5, pp. 60-67.
  • Paper: PDF, HTMLonline appendix
  • DOI: 10.1109/MS.2014.98
  • BibTex: BibSonomy

Supporting Concurrency Abstractions in High-level Language Virtual Machines

Last Friday, I defended my PhD dissertation. Finally, after 4 years and a bit, I am done. Finally. I am very grateful to all the people supporting me along the way and of course to my colleagues for their help.

My work focused on how to build VMs with support for all kind of different concurrent programming abstractions. Since you don’t want to put them into a VM just one by one, I was looking for a unifying substrate that’s up to the task. Below, you’ll find the abstract as well as the slides.

In addition to the thesis text itself, the implementations and tools are available. Please see the project page for more details.

Abstract

During the past decade, software developers widely adopted JVM and CLI as multi-language virtual machines (VMs). At the same time, the multicore revolution burdened developers with increasing complexity. Language implementers devised a wide range of concurrent and parallel programming concepts to address this complexity but struggle to build these concepts on top of common multi-language VMs. Missing support in these VMs leads to tradeoffs between implementation simplicity, correctly implemented language semantics, and performance guarantees.

Departing from the traditional distinction between concurrency and parallelism, this dissertation finds that parallel programming concepts benefit from performance-related VM support, while concurrent programming concepts benefit from VM support that guarantees correct semantics in the presence of reflection, mutable state, and interaction with other languages and libraries.

Focusing on these concurrent programming concepts, this dissertation finds that a VM needs to provide mechanisms for managed state, managed execution, ownership, and controlled enforcement. Based on these requirements, this dissertation proposes an ownership-based metaobject protocol (OMOP) to build novel multi-language VMs with proper concurrent programming support.

This dissertation demonstrates the OMOP’s benefits by building concurrent programming concepts such as agents, software transactional memory, actors, active objects, and communicating sequential processes on top of the OMOP. The performance evaluation shows that OMOP-based implementations of concurrent programming concepts can reach performance on par with that of their conventionally implemented counterparts if the OMOP is supported by the VM.

To conclude, the OMOP proposed in this dissertation provides a unifying and minimal substrate to support concurrent programming on top of multi-language VMs. The OMOP enables language implementers to correctly implement language semantics, while simultaneously enabling VMs to provide efficient implementations.

  • Supporting Concurrency Abstractions in High-level Language Virtual Machines, Stefan Marr. Software Languages Lab, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium, PhD Dissertation, January 2013. ISBN 978-90-5718-256-3.
  • Download: PDF.
  • BibTex: BibSonomy

Slides

Identifying A Unifying Mechanism for the Implementation of Concurrency Abstractions on Multi-Language Virtual Machines

My paper, on how to support various concurrency models, with an ownership-based meta-object protocol (MOP) was accepted at TOOLS’12. Below, you will find a preprint version of the paper. A later post will provide details on how to use it and how to experiment with the MOP in Pharo 1.3.

Abstract

Supporting all known abstractions for concurrent and parallel programming in a virtual machines (VM) is a futile undertaking, but it is required to give programmers appropriate tools and performance. Instead of supporting all abstractions directly, VMs need a unifying mechanism similar to INVOKEDYNAMIC for JVMs.

Our survey of parallel and concurrent programming concepts identifies concurrency abstractions as the ones benefiting most from support in a VM. Currently, their semantics is often weakened, reducing their engineering benefits. They require a mechanism to define flexible language guarantees.

Based on this survey, we define an ownership-based meta-object protocol as candidate for VM support. We demonstrate its expressiveness by implementing actor semantics, software transactional memory, agents, CSP, and active objects. While the performance of our prototype confirms the need for VM support, it also shows that the chosen mechanism is appropriate to express a wide range of concurrency abstractions in a unified way.

  • Identifying A Unifying Mechanism for the Implementation of Concurrency Abstractions on Multi-Language Virtual Machines, Stefan Marr, Theo D’Hondt; Objects, Models, Components, Patterns, 50th International Conference, TOOLS 2012, Prague, Czech Republic, May 28 – June 1, 2012. Proceedings.
  • Paper: PDF
    ©Springer, 2012. The original publication will be made available at www.springerlink.com.
  • BibTex: BibSonomy

Slides