Another Decade of SOM Language Implementation

SOM, the Simple Object Machine, is a little dynamic language designed for teaching object-oriented virtual machine design. It originates in Aarhus, Denmark, and according to Lars Bak, it was implemented in the course of two days by Kasper Lund. They used it back in 2001 for a course at the University of Aarhus.

Last Wednesday, I gave a presentation at ‹Programming›’19 trying to make a case for SOM as a platform for teaching and research. Currently, it is not widely used for teaching, but I think it has a lot of potential. Though, I will admit that I am biased, because SOM was the virtual machine that Michael Haupt used in his course, and where I got interested in language implementation about a decade ago.

The anecdote above is actually something Lars said during my presentation, and, for what it is worth, the current SOM can by ported to another implementation language in about 3 days of work. At least when one knows SOM. So, a smaller version of it, is certainly a plausible result for a two-day project.

If virtual machines are something you are interested in, I can only recommend looking into SOM. A starting point could be the slides below. They contain links to various resources.

For me, the most important aspect of the SOM project is its incredible range. We have very small and simple implementations that can get students started, but at the same time, we have implementations that can compete with state-of-the-art virtual machines. This allows us to teach the basics as well as advanced optimizations. It also makes SOM an excellent foundation for publishable research.

Presentation Abstract

The Simple Object Machine (SOM) can be traced back to Aarhus in 2001. Since then it has been used in various places to teach students about language implementation techniques and to do research on optimizations, tooling, and language features.

The Simple in the name was key to enabling much of this work. It allowed experimentation with various implementation languages as well as language implementation frameworks. It also enabled experiments with optimizations such as zero-overhead run-time metaprogramming, powerful metaobject protocols, efficient implementations of various concurrency models, and debuggers that can support all these features seamlessly. Even industry got curious about the Simple, and used SOM to showcase a framework for garbage collection and just-in-time compilation.

This talk is therefore a shameless plug for SOM, how it is used for research, and its potential for teaching. I’ll try to convince you that SOM, while being a small object-oriented language, is fully equipped to do cutting edge research. At the same time, I’ll argue that it is a great platform to teach language implementation techniques, good software engineering, and perhaps especially relevant for ‹Programming›: good programming practices.

Slides

Efficient Deterministic Replay for Actors

Debugging concurrent systems is pretty hard, and we worked already for a while to make things a bit better. However, a big remaining problem is that bugs are not easily reproduced.

Actor systems allow for nice designs where all actor are loosely coupled and simply communicate via asynchronous messages. Unfortunately, this means there is a lot of non-determinism in the systems. Some issues may only show up for some rare actor schedules. This is a huge hurdle to understand what is going on, and to successfully debug such a race condition.

In our latest paper, Dominik showed that tracing of actor programs can be sufficiently optimized to enable a deterministic replay. We manage to use Truffle to reduce the overhead of tracing to ca. 10% (min. 0%, max. 20%) on microbenchmarks. On the more representative AcmeAir application, the latency of HTTP requests is only increased by 1% on average.

We think, these results show that recording of actor applications for deterministic replay can be practical. Of course, there is still more work necessary. For instance, for the time being, we need to run the application from the start to be able to replay it. So, stay tuned for future work, and attend Dominik’s presentation at ManLang’18.

Abstract

With the ubiquity of parallel commodity hardware, developers turn to high-level concurrency models such as the actor model to lower the complexity of concurrent software. However, debugging concurrent software is hard, especially for concurrency models with a limited set of supporting tools. Such tools often deal only with the underlying threads and locks, which obscures the view on e.g. actors and messages and thereby introduces additional complexity. To improve on this situation, we present a low-overhead record & replay approach for actor languages. It allows one to debug concurrency issues deterministically based on a previously recorded trace. Our evaluation shows that the average run-time overhead for tracing on benchmarks from the Savina suite is 10% (min. 0%, max. 20%). For Acme-Air, a modern web application, we see a maximum increase of 1% in latency for HTTP requests and about 1.4 MB/s of trace data. These results are a first step towards deterministic replay debugging of actor systems in production.

  • Efficient and Deterministic Record & Replay for Actor Languages
    D. Aumayr, S. Marr, C. Béra, E. Gonzalez Boix, H. Mössenböck; In Proceedings of the 15th International Conference on Managed Languages and Runtimes, ManLang'18, ACM, 2018.
  • Paper: PDF
  • DOI: 10.1145/3237009.3237015
  • BibTex: bibtex
    @inproceedings{Aumayr:2018:RR,
      abstract = {With the ubiquity of parallel commodity hardware, developers turn to high-level concurrency models such as the actor model to lower the complexity of concurrent software. However, debugging concurrent software is hard, especially for concurrency models with a limited set of supporting tools. Such tools often deal only with the underlying threads and locks, which obscures the view on e.g. actors and messages and thereby introduces additional complexity.
      To improve on this situation, we present a low-overhead record & replay approach for actor languages. It allows one to debug concurrency issues deterministically based on a previously recorded trace. Our evaluation shows that the average run-time overhead for tracing on benchmarks from the Savina suite is 10% (min. 0%, max. 20%). For Acme-Air, a modern web application, we see a maximum increase of 1% in latency for HTTP requests and about 1.4 MB/s of trace data. These results are a  first step towards deterministic replay debugging of actor systems in production.},
      acceptancerate = {0.72},
      author = {Aumayr, Dominik and Marr, Stefan and Béra, Clément and Gonzalez Boix, Elisa and Mössenböck, Hanspeter},
      blog = {http://stefan-marr.de/2018/08/deterministic-replay-for-actors/},
      booktitle = {Proceedings of the 15th International Conference on Managed Languages and Runtimes},
      day = {12--13},
      doi = {10.1145/3237009.3237015},
      isbn = {978-1-4503-6424-9/18/09},
      keywords = {Actors Concurrency Debugging Determinism MeMyPublication Replay SOMns Tracing Truffle},
      month = sep,
      pdf = {http://stefan-marr.de/downloads/manlang18-aumayr-et-al-efficient-and-deterministic-record-and-replay-for-actor-languages.pdf},
      publisher = {ACM},
      series = {ManLang'18},
      title = {{Efficient and Deterministic Record \& Replay for Actor Languages}},
      year = {2018},
      month_numeric = {9}
    }
    

A Typical Truffle Specialization Pitfall

This post is co-authored with Fabio Niephaus.

The Truffle framework allows us to write language interpreters in an easy way. In combination with the Graal compiler and its partial evaluator, such Truffle interpreters are able to be as fast as custom VMs. A crucial part of the framework to achieve performance are so-called specializations, which are used to define highly optimized and speculative optimizations for the very basic operations of a language.

Writing such specializations is generally pretty straight forward, but there is at least one common pitfall. When designing specializations, we need to remind ourselves that the parameter types of specializations are technically guards. This means, the activation semantics of specializations depends not only on explicit guards, but also on the semantics of Java’s type system.

Pitfall for Specializations

Let’s have a look at the following code example. It sketches a Truffle node that can be used to check whether an object is some kind of number.

public abstract class IsNumberNode extends Node {

  public abstract int executeEvaluated(Object o);

  @Specialization
  protected final int doInteger(final int o) {
    return 1;
  }

  @Specialization
  protected final int doFloat(final float o) {
    return 2;
  }

  @Specialization
  protected final int doObject(final Object o) {
    return 0;
  }
}

Truffle generates a concrete implementation for this abstract class. To use it, the executeEvaluated(Object) method can be called, which will automatically select one of the three specializations for int, float, and Object based on the given argument.

Next, let’s see this node in action:

IsNumberNode n = IsNumberNodeGen.create();

n.executeEvaluated(42);            // --> 1
n.executeEvaluated(44.3);          // --> 2
n.executeEvaluated(new Object());  // --> 0

n.executeEvaluated(22.7);          // --> 2

Great, so the node works as expected, right? Let’s double check:

IsNumberNode n = IsNumberNodeGen.create();

n.executeEvaluated(new Object());  // --> 0
n.executeEvaluated(44.3);          // --> 0
n.executeEvaluated(42);            // --> 0

This time, the node seems to always return 0. But why?

The first time the node is invoked, it sees an Object and returns the correct result. Additionally, and this is the important side effect, this invocation also activates the isObject(Object) specialization inside the node. When the node is invoked again, it will first check whether any of the previously activated specializations match the given argument. In our example, the float and int values are Java Objects and therefore the node always returns 0. This also explains the behavior of the node in the previous series of invocations. First, the node was called with an int, a float, and then an Object. Therefore, all specializations were activated and the node returned the expected result for all invocations.

One reason for these specialization semantics is that we need to carefully balance the benefits of specializations and the cost of falling back to a more general version of an operation. This falling back, or more technically deoptimizing can have a high run-time overhead, because it might require recompilation of methods by the just-in-time compiler. Thus, if we saw the need for a more general specialization, we try to continue to use it, and only activate another specialization when none of the previously used ones is sufficient. Another important reason for this approach is to minimize the number of guards that need to be checked at run time. The general assumption here is that we have a high chance to match a previously activated one.

In case we do not actually want the Java semantics, as in our example, the isObject(Object) specialization needs to be guarded. This means, we need to be sure that it cannot be called with and activated by ints and floats. Here’s how this could look like in our example:

public abstract class IsNumberNode extends Node {
  // ...

  protected final boolean isInteger(final Object o) {
    return o instanceof Integer;
  }

  protected final boolean isFloat(final Object o) {
    return o instanceof Float;
  }

  @Specialization(guards = {"!isInteger(o)", "!isFloat(o)"})
  protected final int doObject(final Object o) {
    return 0;
  }
}

These guards are parameters for the @Specialization annotation and one can use helper functions that performinstanceof checks to guard the specialization accordingly.

For nodes with many specializations, this can become very tedious, because we need to repeat all implicit and explicit guards for such specializations. To avoid this in cases there is only one such fallback specialization, the Truffle framework provides the @Fallback annotation as a shortcut. It will implicitly use all guards and negate them. Thus, we can write the following for our example:

public abstract class IsNumberNode extends Node {
  // ...

  @Fallback
  protected final int doObject(final Object o) {
    return 0;
  }
}

How to Avoid Specialization Pitfalls?

As the example demonstrates, the described problem can occur when there are specializations for types that are in the same class hierarchy, especially in case of a specialization for the most general type Object.

At the moment, Truffle users can only manually check if they have nodes with such specializations to avoid this issue. But perhaps we can do a little better.

Very useful would be a testing tool that ensures coverage for all specializations as well as all possible combinations. This would allow us to find erroneous/undesired generalization relationships between specializations, and could also ensure that a node provides all required specializations. Especially for beginners, it would also be nice to have a visual tool to inspect specializations and their activation behavior. Perhaps it could be possible to have it as part of IGV.

Depending on how commonly one actually wants such generalization or subsumption semantics of specializations, one could consider using Truffle’s annotation processors to perform extra checks. They already perform various checks and triggers errors, for example, for syntax errors in guard definitions. Perhaps, it could also generate a warning or an info message in case it detects specializations for types that are part of the same class hierarchy to make users aware of this issue. Thus, if generalization/subsumption are less common, one might simply indicate them explicitly, perhaps in addition to the existing replaces parameter for the @Specialization annotation.

Older Posts

Subscribe via RSS