Efficient Deterministic Replay for Actors

Debugging concurrent systems is pretty hard, and we worked already for a while to make things a bit better. However, a big remaining problem is that bugs are not easily reproduced.

Actor systems allow for nice designs where all actor are loosely coupled and simply communicate via asynchronous messages. Unfortunately, this means there is a lot of non-determinism in the systems. Some issues may only show up for some rare actor schedules. This is a huge hurdle to understand what is going on, and to successfully debug such a race condition.

In our latest paper, Dominik showed that tracing of actor programs can be sufficiently optimized to enable a deterministic replay. We manage to use Truffle to reduce the overhead of tracing to ca. 10% (min. 0%, max. 20%) on microbenchmarks. On the more representative AcmeAir application, the latency of HTTP requests is only increased by 1% on average.

We think, these results show that recording of actor applications for deterministic replay can be practical. Of course, there is still more work necessary. For instance, for the time being, we need to run the application from the start to be able to replay it. So, stay tuned for future work, and attend Dominik’s presentation at ManLang’18.

Abstract

With the ubiquity of parallel commodity hardware, developers turn to high-level concurrency models such as the actor model to lower the complexity of concurrent software. However, debugging concurrent software is hard, especially for concurrency models with a limited set of supporting tools. Such tools often deal only with the underlying threads and locks, which obscures the view on e.g. actors and messages and thereby introduces additional complexity. To improve on this situation, we present a low-overhead record & replay approach for actor languages. It allows one to debug concurrency issues deterministically based on a previously recorded trace. Our evaluation shows that the average run-time overhead for tracing on benchmarks from the Savina suite is 10% (min. 0%, max. 20%). For Acme-Air, a modern web application, we see a maximum increase of 1% in latency for HTTP requests and about 1.4 MB/s of trace data. These results are a first step towards deterministic replay debugging of actor systems in production.

  • Efficient and Deterministic Record & Replay for Actor Languages
    D. Aumayr, S. Marr, C. Béra, E. Gonzalez Boix, H. Mössenböck; In Proceedings of the 15th International Conference on Managed Languages and Runtimes, ManLang'18, ACM, 2018.
  • Paper: PDF
  • DOI: 10.1145/3237009.3237015
  • BibTex: bibtex
    @inproceedings{Aumayr:2018:RR,
      abstract = {With the ubiquity of parallel commodity hardware, developers turn to high-level concurrency models such as the actor model to lower the complexity of concurrent software. However, debugging concurrent software is hard, especially for concurrency models with a limited set of supporting tools. Such tools often deal only with the underlying threads and locks, which obscures the view on e.g. actors and messages and thereby introduces additional complexity.
      To improve on this situation, we present a low-overhead record & replay approach for actor languages. It allows one to debug concurrency issues deterministically based on a previously recorded trace. Our evaluation shows that the average run-time overhead for tracing on benchmarks from the Savina suite is 10% (min. 0%, max. 20%). For Acme-Air, a modern web application, we see a maximum increase of 1% in latency for HTTP requests and about 1.4 MB/s of trace data. These results are a  first step towards deterministic replay debugging of actor systems in production.},
      acceptancerate = {0.72},
      author = {Aumayr, Dominik and Marr, Stefan and Béra, Clément and Gonzalez Boix, Elisa and Mössenböck, Hanspeter},
      blog = {http://stefan-marr.de/2018/08/deterministic-replay-for-actors/},
      booktitle = {Proceedings of the 15th International Conference on Managed Languages and Runtimes},
      day = {12--13},
      doi = {10.1145/3237009.3237015},
      isbn = {978-1-4503-6424-9/18/09},
      keywords = {Actors Concurrency Debugging Determinism MeMyPublication Replay SOMns Tracing Truffle},
      month = sep,
      pdf = {http://stefan-marr.de/downloads/manlang18-aumayr-et-al-efficient-and-deterministic-record-and-replay-for-actor-languages.pdf},
      publisher = {ACM},
      series = {ManLang'18},
      title = {{Efficient and Deterministic Record \& Replay for Actor Languages}},
      year = {2018},
      month_numeric = {9}
    }
    

A Typical Truffle Specialization Pitfall

This post is co-authored with Fabio Niephaus.

The Truffle framework allows us to write language interpreters in an easy way. In combination with the Graal compiler and its partial evaluator, such Truffle interpreters are able to be as fast as custom VMs. A crucial part of the framework to achieve performance are so-called specializations, which are used to define highly optimized and speculative optimizations for the very basic operations of a language.

Writing such specializations is generally pretty straight forward, but there is at least one common pitfall. When designing specializations, we need to remind ourselves that the parameter types of specializations are technically guards. This means, the activation semantics of specializations depends not only on explicit guards, but also on the semantics of Java’s type system.

Pitfall for Specializations

Let’s have a look at the following code example. It sketches a Truffle node that can be used to check whether an object is some kind of number.

public abstract class IsNumberNode extends Node {

  public abstract int executeEvaluated(Object o);

  @Specialization
  protected final int doInteger(final int o) {
    return 1;
  }

  @Specialization
  protected final int doFloat(final float o) {
    return 2;
  }

  @Specialization
  protected final int doObject(final Object o) {
    return 0;
  }
}

Truffle generates a concrete implementation for this abstract class. To use it, the executeEvaluated(Object) method can be called, which will automatically select one of the three specializations for int, float, and Object based on the given argument.

Next, let’s see this node in action:

IsNumberNode n = IsNumberNodeGen.create();

n.executeEvaluated(42);            // --> 1
n.executeEvaluated(44.3);          // --> 2
n.executeEvaluated(new Object());  // --> 0

n.executeEvaluated(22.7);          // --> 2

Great, so the node works as expected, right? Let’s double check:

IsNumberNode n = IsNumberNodeGen.create();

n.executeEvaluated(new Object());  // --> 0
n.executeEvaluated(44.3);          // --> 0
n.executeEvaluated(42);            // --> 0

This time, the node seems to always return 0. But why?

The first time the node is invoked, it sees an Object and returns the correct result. Additionally, and this is the important side effect, this invocation also activates the isObject(Object) specialization inside the node. When the node is invoked again, it will first check whether any of the previously activated specializations match the given argument. In our example, the float and int values are Java Objects and therefore the node always returns 0. This also explains the behavior of the node in the previous series of invocations. First, the node was called with an int, a float, and then an Object. Therefore, all specializations were activated and the node returned the expected result for all invocations.

One reason for these specialization semantics is that we need to carefully balance the benefits of specializations and the cost of falling back to a more general version of an operation. This falling back, or more technically deoptimizing can have a high run-time overhead, because it might require recompilation of methods by the just-in-time compiler. Thus, if we saw the need for a more general specialization, we try to continue to use it, and only activate another specialization when none of the previously used ones is sufficient. Another important reason for this approach is to minimize the number of guards that need to be checked at run time. The general assumption here is that we have a high chance to match a previously activated one.

In case we do not actually want the Java semantics, as in our example, the isObject(Object) specialization needs to be guarded. This means, we need to be sure that it cannot be called with and activated by ints and floats. Here’s how this could look like in our example:

public abstract class IsNumberNode extends Node {
  // ...

  protected final boolean isInteger(final Object o) {
    return o instanceof Integer;
  }

  protected final boolean isFloat(final Object o) {
    return o instanceof Float;
  }

  @Specialization(guards = {"!isInteger(o)", "!isFloat(o)"})
  protected final int doObject(final Object o) {
    return 0;
  }
}

These guards are parameters for the @Specialization annotation and one can use helper functions that performinstanceof checks to guard the specialization accordingly.

For nodes with many specializations, this can become very tedious, because we need to repeat all implicit and explicit guards for such specializations. To avoid this in cases there is only one such fallback specialization, the Truffle framework provides the @Fallback annotation as a shortcut. It will implicitly use all guards and negate them. Thus, we can write the following for our example:

public abstract class IsNumberNode extends Node {
  // ...

  @Fallback
  protected final int doObject(final Object o) {
    return 0;
  }
}

How to Avoid Specialization Pitfalls?

As the example demonstrates, the described problem can occur when there are specializations for types that are in the same class hierarchy, especially in case of a specialization for the most general type Object.

At the moment, Truffle users can only manually check if they have nodes with such specializations to avoid this issue. But perhaps we can do a little better.

Very useful would be a testing tool that ensures coverage for all specializations as well as all possible combinations. This would allow us to find erroneous/undesired generalization relationships between specializations, and could also ensure that a node provides all required specializations. Especially for beginners, it would also be nice to have a visual tool to inspect specializations and their activation behavior. Perhaps it could be possible to have it as part of IGV.

Depending on how commonly one actually wants such generalization or subsumption semantics of specializations, one could consider using Truffle’s annotation processors to perform extra checks. They already perform various checks and triggers errors, for example, for syntax errors in guard definitions. Perhaps, it could also generate a warning or an info message in case it detects specializations for types that are part of the same class hierarchy to make users aware of this issue. Thus, if generalization/subsumption are less common, one might simply indicate them explicitly, perhaps in addition to the existing replaces parameter for the @Specialization annotation.

How to Design Collection Libraries?

Programming languages naturally come with a library of containers or collection types. They allow us to easily work with arbitrary number of elements, which is something all major languages care about. Unfortunately, it seems like there is not much writing on how to design such libraries. Even asking a few people that worked for a long time on collection libraries did not yield much of a structured approach to such a central element for our languages. The one major piece of writing we found is the Scala people describing their experience with bit rot and how they redesigned their collection implementation to avoid it.

There do not seem to be any concrete answers on what one should include in a collection library however. So, which types of collections are useful? Which ones are must haves, and which ones would be nice to have? What should the overall shape of the library be and how should collections be designed to work well together?

Trying to find answers to at least some of these questions, we started to look into the libraries of various languages to get a feeling of common design aspects and design dimensions. And the possible design space seems huge. Some languages have only a minimal set of standard collections, for instance, C and Lua. Languages such as Java and Scala have however large numbers of collections.

To weight the various options against each other, we chose to discuss the design dimensions we found in the context of exploratory programming. The discussion is part of the write-up below.

We will present this work in about a month at the PX/18 workshop in Nice, France.

Abstract

While an integral part of all programming languages, the design of collection libraries is rarely studied. This work briefly reviews the collection libraries of 14 languages to identify possible design dimensions. Some languages have surprisingly few but versatile collections, while others have large libraries with many specialized collections. Based on the identified design dimensions, we argue that a small collection library with only a sequence, a map, and a set type are a suitable choice to facilitate exploratory programming. Such a design minimizes the number of decisions programmers have to make when dealing with collections, and it improves discoverability of collection operations. We further discuss techniques that make their implementation practical from a performance perspective. Based on these arguments, we conclude that languages which aim to support exploratory programming should strive for small and versatile collection libraries.

  • Few Versatile vs. Many Specialized Collections: How to design a collection library for exploratory programming?
    S. Marr, B. Daloze; In Proceedings of Programming Experience Workshop, PX/18, p. 135–143, ACM, 2018.
  • Paper: HTML, PDF
  • DOI: 10.1145/3191697.3214334
  • BibTex: bibtex
    @inproceedings{Marr:2018:COLL-PX,
      abstract = {While an integral part of all programming languages, the design of collection libraries is rarely studied. This work briefly reviews the collection libraries of 14 languages to identify possible design dimensions. Some languages have surprisingly few but versatile collections, while others have large libraries with many specialized collections. Based on the identified design dimensions, we argue that a small collection library with only a sequence, a map, and a set type are a suitable choice to facilitate exploratory programming. Such a design minimizes the number of decisions programmers have to make when dealing with collections, and it improves discoverability of collection operations. We further discuss techniques that make their implementation practical from a performance perspective. Based on these arguments, we conclude that languages which aim to support exploratory programming should strive for small and versatile collection libraries.},
      acceptancerate = {1.00},
      author = {Marr, Stefan and Daloze, Benoit},
      blog = {http://stefan-marr.de/2018/03/how-to-design-collection-libraries/},
      booktitle = {Proceedings of Programming Experience Workshop},
      day = {10},
      doi = {10.1145/3191697.3214334},
      html = {http://stefan-marr.de/papers/px-marr-daloze-few-versatile-vs-many-specialized-collections/},
      isbn = {978-1-4503-5513-1},
      keywords = {Analysis Collections Design Framework Implementation Library MeMyPublication Opinion Survey myown},
      month = apr,
      numpages = {9},
      pages = {135--143},
      pdf = {http://stefan-marr.de/downloads/px18-marr-daloze-few-versatile-vs-many-specialized-collections.pdf},
      publisher = {ACM},
      series = {PX/18},
      title = {{Few Versatile vs. Many Specialized Collections: How to design a collection library for exploratory programming?}},
      year = {2018},
      month_numeric = {4}
    }
    

Slides

Older Posts

Subscribe via RSS