A Typical Truffle Specialization Pitfall

This post is co-authored with Fabio Niephaus.

The Truffle framework allows us to write language interpreters in an easy way. In combination with the Graal compiler and its partial evaluator, such Truffle interpreters are able to be as fast as custom VMs. A crucial part of the framework to achieve performance are so-called specializations, which are used to define highly optimized and speculative optimizations for the very basic operations of a language.

Writing such specializations is generally pretty straight forward, but there is at least one common pitfall. When designing specializations, we need to remind ourselves that the parameter types of specializations are technically guards. This means, the activation semantics of specializations depends not only on explicit guards, but also on the semantics of Java’s type system.

Pitfall for Specializations

Let’s have a look at the following code example. It sketches a Truffle node that can be used to check whether an object is some kind of number.

public abstract class IsNumberNode extends Node {

  public abstract int executeEvaluated(Object o);

  @Specialization
  protected final int doInteger(final int o) {
    return 1;
  }

  @Specialization
  protected final int doFloat(final float o) {
    return 2;
  }

  @Specialization
  protected final int doObject(final Object o) {
    return 0;
  }
}

Truffle generates a concrete implementation for this abstract class. To use it, the executeEvaluated(Object) method can be called, which will automatically select one of the three specializations for int, float, and Object based on the given argument.

Next, let’s see this node in action:

IsNumberNode n = IsNumberNodeGen.create();

n.executeEvaluated(42);            // --> 1
n.executeEvaluated(44.3);          // --> 2
n.executeEvaluated(new Object());  // --> 0

n.executeEvaluated(22.7);          // --> 2

Great, so the node works as expected, right? Let’s double check:

IsNumberNode n = IsNumberNodeGen.create();

n.executeEvaluated(new Object());  // --> 0
n.executeEvaluated(44.3);          // --> 0
n.executeEvaluated(42);            // --> 0

This time, the node seems to always return 0. But why?

The first time the node is invoked, it sees an Object and returns the correct result. Additionally, and this is the important side effect, this invocation also activates the isObject(Object) specialization inside the node. When the node is invoked again, it will first check whether any of the previously activated specializations match the given argument. In our example, the float and int values are Java Objects and therefore the node always returns 0. This also explains the behavior of the node in the previous series of invocations. First, the node was called with an int, a float, and then an Object. Therefore, all specializations were activated and the node returned the expected result for all invocations.

One reason for these specialization semantics is that we need to carefully balance the benefits of specializations and the cost of falling back to a more general version of an operation. This falling back, or more technically deoptimizing can have a high run-time overhead, because it might require recompilation of methods by the just-in-time compiler. Thus, if we saw the need for a more general specialization, we try to continue to use it, and only activate another specialization when none of the previously used ones is sufficient. Another important reason for this approach is to minimize the number of guards that need to be checked at run time. The general assumption here is that we have a high chance to match a previously activated one.

In case we do not actually want the Java semantics, as in our example, the isObject(Object) specialization needs to be guarded. This means, we need to be sure that it cannot be called with and activated by ints and floats. Here’s how this could look like in our example:

public abstract class IsNumberNode extends Node {
  // ...

  protected final boolean isInteger(final Object o) {
    return o instanceof Integer;
  }

  protected final boolean isFloat(final Object o) {
    return o instanceof Float;
  }

  @Specialization(guards = {"!isInteger(o)", "!isFloat(o)"})
  protected final int doObject(final Object o) {
    return 0;
  }
}

These guards are parameters for the @Specialization annotation and one can use helper functions that performinstanceof checks to guard the specialization accordingly.

For nodes with many specializations, this can become very tedious, because we need to repeat all implicit and explicit guards for such specializations. To avoid this in cases there is only one such fallback specialization, the Truffle framework provides the @Fallback annotation as a shortcut. It will implicitly use all guards and negate them. Thus, we can write the following for our example:

public abstract class IsNumberNode extends Node {
  // ...

  @Fallback
  protected final int doObject(final Object o) {
    return 0;
  }
}

How to Avoid Specialization Pitfalls?

As the example demonstrates, the described problem can occur when there are specializations for types that are in the same class hierarchy, especially in case of a specialization for the most general type Object.

At the moment, Truffle users can only manually check if they have nodes with such specializations to avoid this issue. But perhaps we can do a little better.

Very useful would be a testing tool that ensures coverage for all specializations as well as all possible combinations. This would allow us to find erroneous/undesired generalization relationships between specializations, and could also ensure that a node provides all required specializations. Especially for beginners, it would also be nice to have a visual tool to inspect specializations and their activation behavior. Perhaps it could be possible to have it as part of IGV.

Depending on how commonly one actually wants such generalization or subsumption semantics of specializations, one could consider using Truffle’s annotation processors to perform extra checks. They already perform various checks and triggers errors, for example, for syntax errors in guard definitions. Perhaps, it could also generate a warning or an info message in case it detects specializations for types that are part of the same class hierarchy to make users aware of this issue. Thus, if generalization/subsumption are less common, one might simply indicate them explicitly, perhaps in addition to the existing replaces parameter for the @Specialization annotation.

How to Design Collection Libraries?

Programming languages naturally come with a library of containers or collection types. They allow us to easily work with arbitrary number of elements, which is something all major languages care about. Unfortunately, it seems like there is not much writing on how to design such libraries. Even asking a few people that worked for a long time on collection libraries did not yield much of a structured approach to such a central element for our languages. The one major piece of writing we found is the Scala people describing their experience with bit rot and how they redesigned their collection implementation to avoid it.

There do not seem to be any concrete answers on what one should include in a collection library however. So, which types of collections are useful? Which ones are must haves, and which ones would be nice to have? What should the overall shape of the library be and how should collections be designed to work well together?

Trying to find answers to at least some of these questions, we started to look into the libraries of various languages to get a feeling of common design aspects and design dimensions. And the possible design space seems huge. Some languages have only a minimal set of standard collections, for instance, C and Lua. Languages such as Java and Scala have however large numbers of collections.

To weight the various options against each other, we chose to discuss the design dimensions we found in the context of exploratory programming. The discussion is part of the write-up below.

We will present this work in about a month at the PX/18 workshop in Nice, France.

Abstract

While an integral part of all programming languages, the design of collection libraries is rarely studied. This work briefly reviews the collection libraries of 14 languages to identify possible design dimensions. Some languages have surprisingly few but versatile collections, while others have large libraries with many specialized collections. Based on the identified design dimensions, we argue that a small collection library with only a sequence, a map, and a set type are a suitable choice to facilitate exploratory programming. Such a design minimizes the number of decisions programmers have to make when dealing with collections, and it improves discoverability of collection operations. We further discuss techniques that make their implementation practical from a performance perspective. Based on these arguments, we conclude that languages which aim to support exploratory programming should strive for small and versatile collection libraries.

  • Few Versatile vs. Many Specialized Collections: How to design a collection library for exploratory programming?
    S. Marr, B. Daloze; In Proceedings of Programming Experience Workshop, PX/18, p. 1–9, ACM, 2018.
  • Paper: HTML, PDF
  • DOI:
  • BibTex: bibtex
    @inproceedings{Marr:2018:COLL-PX,
      abstract = {While an integral part of all programming languages, the design of collection libraries is rarely studied. This work briefly reviews the collection libraries of 14 languages to identify possible design dimensions. Some languages have surprisingly few but versatile collections, while others have large libraries with many specialized collections. Based on the identified design dimensions, we argue that a small collection library with only a sequence, a map, and a set type are a suitable choice to facilitate exploratory programming. Such a design minimizes the number of decisions programmers have to make when dealing with collections, and it improves discoverability of collection operations. We further discuss techniques that make their implementation practical from a performance perspective. Based on these arguments, we conclude that languages which aim to support exploratory programming should strive for small and versatile collection libraries.},
      acceptancerate = {1.00},
      added-at = {2018-03-06T15:36:25.000+0100},
      author = {Marr, Stefan and Daloze, Benoit},
      biburl = {https://www.bibsonomy.org/bibtex/2b688107ac38fec80b4a2f8928f7c7693/gron},
      blog = {http://stefan-marr.de/2018/03/how-to-design-collection-libraries/},
      booktitle = {Proceedings of Programming Experience Workshop},
      day = {10},
      html = {http://stefan-marr.de/papers/px-marr-daloze-few-versatile-vs-many-specialized-collections/},
      interhash = {0f02e13340bffa5027ba860ae1e1a4d0},
      intrahash = {b688107ac38fec80b4a2f8928f7c7693},
      keywords = {Analysis Collections Design Framework Implementation Library MeMyPublication Opinion Survey myown},
      month = apr,
      numpages = {9},
      pages = {1--9},
      pdf = {http://stefan-marr.de/downloads/px18-marr-daloze-few-versatile-vs-many-specialized-collections.pdf},
      publisher = {ACM},
      series = {PX/18},
      timestamp = {2018-03-13T20:40:10.000+0100},
      title = {{Few Versatile vs. Many Specialized Collections: How to design a collection library for exploratory programming?}},
      year = {2018},
      month_numeric = {4}
    }
    

Slides

Debugging Concurrency Is Hard, but We Can Do Something About It!

When we have to debug applications that use concurrency, perhaps written in Java, all we get from the debugger is a list of threads, perhaps some information about held locks, and the ability to step through each thread separately.

What we usually don’t get is any kind of high-level support for the concurrency abstractions that we used in our applications. We might have used a library for actors, channels from CSP, Java’s fork/join pool, or perhaps even some software transactional memory. This means, in the debugger, we cannot reason anymore about the same concepts we used to built the application. Instead, we can only vaguely recognize them from how the threads interact. We might see that one thread puts an object, which is perhaps a message, into another object, which represents a mailbox. But we don’t get any support to actually follow that message as it is being processed. We also have no way of jumping from the point where we spawn a fork/join task to its activation to see the computation unfold recursively and in parallel. And wouldn’t it be nice if we could simple set a breakpoint on a promise and see where we continue executing when it got resolved?

I am pretty sure these things would make it easier to understand what’s going on in my applications. Unfortunately, debugger support for such high-level concepts is rare or non-existing.

One of the biggest problems is that there are too many different concurrency models, concepts, languages, and libraries. This means designing debugging support for all these variations is a huge problem. And using the same debugger for Akka actors, Erlang, Pony, or your favorite actor library doesn’t seem feasible. Not to mention all those other systems. You want to debug goroutines and their use of channels, Clojure’s core.async, or perhaps Java’s JCSP?

Turns out, building a debugger for all these things isn’t impossible. We can design the debugger in a way that it can remain completely oblivious of the various concurrency concepts. This means, it can be used for a wide range of concurrent systems without having to be adapted specifically for them. To make this work, we propose the Kómpos protocol to communicate the necessary details from a language runtime to the debugger in a concurrency-agnostic way.

For a small demo, see the video below. The abstract for our paper describing the details of the approach, as well as links to it follow afterwards.

For questions or comments, please find me on Twitter @smarr.

Abstract

Today’s complex software systems combine high-level concurrency models. Each model is used to solve a specific set of problems. Unfortunately, debuggers support only the low-level notions of threads and shared memory, forcing developers to reason about these notions instead of the high-level concurrency models they chose.

This paper proposes a concurrency-agnostic debugger protocol that decouples the debugger from the concurrency models employed by the target application. As a result, the underlying language runtime can define custom breakpoints, stepping operations, and execution events for each concurrency model it supports, and a debugger can expose them without having to be specifically adapted.

We evaluated the generality of the protocol by applying it to SOMns, a Newspeak implementation, which supports a diversity of concurrency models including communicating sequential processes, communicating event loops, threads and locks, fork/join parallelism, and software transactional memory. We implemented 21 breakpoints and 20 stepping operations for these concurrency models. For none of these, the debugger needed to be changed. Furthermore, we visualize all concurrent interactions independently of a specific concurrency model. To show that tooling for a specific concurrency model is possible, we visualize actor turns and message sends separately.

  • A Concurrency-Agnostic Protocol for Multi-Paradigm Concurrent Debugging Tools
    S. Marr, C. Torres Lopez, D. Aumayr, E. Gonzalez Boix, H. Mössenböck; In Proceedings of the 13th ACM SIGPLAN International Symposium on Dynamic Languages, DLS'17, ACM, 2017.
  • Paper: HTML, PDF
  • DOI: 10.1145/3133841.3133842
  • BibTex: bibtex
    @inproceedings{Marr:2017:CPCD,
      abstract = {Today's complex software systems combine high-level concurrency models. Each model is used to solve a specific set of problems. Unfortunately, debuggers support only the low-level notions of threads and shared memory, forcing developers to reason about these notions instead of the high-level concurrency models they chose.
      
      This paper proposes a concurrency-agnostic debugger protocol that decouples the debugger from the concurrency models employed by the target application. As a result, the underlying language runtime can define custom breakpoints, stepping operations, and execution events for each concurrency model it supports, and a debugger can expose them without having to be specifically adapted.
      
      We evaluated the generality of the protocol by applying it to SOMns, a Newspeak implementation, which supports a diversity of concurrency models including communicating sequential processes, communicating event loops, threads and locks, fork/join parallelism, and software transactional memory. We implemented 21 breakpoints and 20 stepping operations for these concurrency models. For none of these, the debugger needed to be changed. Furthermore, we visualize all concurrent interactions independently of a specific concurrency model. To show that tooling for a specific concurrency model is possible, we visualize actor turns and message sends separately.},
      acceptancerate = {0.64},
      added-at = {2017-08-13T23:37:23.000+0200},
      author = {Marr, Stefan and Torres Lopez, Carmen and Aumayr, Dominik and Gonzalez Boix, Elisa and Mössenböck, Hanspeter},
      biburl = {https://www.bibsonomy.org/bibtex/29f119d2833c39979c7d679e49b067abe/gron},
      blog = {http://stefan-marr.de/2017/08/concurrency-agnostic-protocol-for-debugging/},
      booktitle = {Proceedings of the 13th ACM SIGPLAN International Symposium on Dynamic Languages},
      day = {24},
      doi = {10.1145/3133841.3133842},
      html = {http://stefan-marr.de/papers/dls-marr-et-al-concurrency-agnostic-protocol-for-debugging/},
      interhash = {fd0460a53c7ec4f41c14ccb10cc22a9f},
      intrahash = {9f119d2833c39979c7d679e49b067abe},
      isbn = {978-1-4503-5526-1/17/10},
      keywords = {Breakpoints Concurrency Debugging MeMyPublication Stepping Tooling Visualization myown},
      location = {Vancouver, Canada},
      month = oct,
      note = {(acceptance rate 64%)},
      numpages = {12},
      pdf = {http://stefan-marr.de/downloads/dls17-marr-et-al-concurrency-agnostic-protocol-for-debugging.pdf},
      publisher = {ACM},
      series = {DLS'17},
      timestamp = {2017-08-22T16:46:25.000+0200},
      title = {{A Concurrency-Agnostic Protocol for Multi-Paradigm Concurrent Debugging Tools}},
      year = {2017},
      month_numeric = {10}
    }
    

Older Posts

Subscribe via RSS