Tag Archives: Optimization

Language Research with Truffle at the SPLASH’16 Conference

Next weekend starts one of the major conferences of the programming languages research community. The conference hosts many events including our Meta’16 workshop on Metaprogramming, SPLASH-I with research and industry talks, the Dynamic Languages Symposium, and the OOPSLA research track.

This year, the overall program includes 9 talks on Truffle and Graal-related topics. This includes various topics including optimizing high-level metaprogramming, low-level machine code, benchmarking, parallel programming. I posted a full list including abstracts here: Truffle and Graal Presentations @SPLASH’16. Below is an overview and links to the talks:

Sunday, Oct. 30th

AST Specialisation and Partial Evaluation for Easy High-Performance Metaprogramming (PDF)
Chris Seaton, Oracle Labs
Meta’16 workshop 11:30-12:00

Towards Advanced Debugging Support for Actor Languages: Studying Concurrency Bugs in Actor-based Programs (PDF)
Carmen Torres Lopez, Stefan Marr, Hanspeter Moessenboeck, Elisa Gonzalez Boix
Agere’16 workshop 14:10-14:30

Monday, Oct. 31st

Bringing Low-Level Languages to the JVM: Efficient Execution of LLVM IR on Truffle (PDF)
Manuel Rigger, Matthias Grimmer, Christian Wimmer, Thomas Würthinger, Hanspeter Mössenböck
VMIL’16 workshop 15:40-16:05

Tuesday, Nov. 1st

Building Efficient and Highly Run-time Adaptable Virtual Machines (PDF)
Guido Chari, Diego Garbervetsky, Stefan Marr
DLS 13:55-14:20

Optimizing R Language Execution via Aggressive Speculation
Lukas Stadler, Adam Welc, Christian Humer, Mick Jordan
DLS 14:45-15:10

Cross-Language Compiler Benchmarking—Are We Fast Yet? (PDF)
Stefan Marr, Benoit Daloze, Hanspeter Mössenböck
DLS 16:30-16:55

Thursday, Nov. 3rd

GEMs: Shared-memory Parallel Programming for Node.js (DOI)
Daniele Bonetta, Luca Salucci, Stefan Marr, Walter Binder
OOPSLA conference 11:20-11:45

Efficient and Thread-Safe Objects for Dynamically-Typed Languages (PDF)
Benoit Daloze, Stefan Marr, Daniele Bonetta, Hanspeter Mössenböck
OOPSLA conference 13:30-13:55

Truffle and Graal: Fast Programming Languages With Modest Effort
Chris Seaton, Oracle Labs
SPLASH-I 14:20-15:10

Type Hierarchies and Guards in Truffle Languages

Continuing a little bit with writing notes on Truffle and Graal, this one is based on my observations in SOMns and changes to its message dispatch mechanism. Specifically, I refactored the main message dispatch chain in SOMns. As in Self and Newspeak, all interactions with objects are message sends. Thus, field access and method invocation is essentially the same. This means that message sending is a key to good performance.

In my previous design, I structured the dispatch chain in a way that, I thought, I'd reduce the necessary runtime checks. This design decision still came from TruffleSOM where the class hierarchy was much simpler and it still seems to work.

My naive design distinguished two different cases. One case is that the receiver is a standard Java objects, for instance boxed primitives such as longs and doubles, or other Java objects that is used directly. The second case is objects from my own hierarchy of Smalltalk objects under SAbstractObject.

The hierarchy is a little more involved, it includes the abstract class, a class for objects that have a Smalltalk class SObjectWithClass, a class for objects without fields, for objects with fields, and that one is then again subclassed by classes for mutable and immutable objects. There are still a few more details to it, but I think you get the idea.

So, with that, I thought, let's structure the dispatch chain like this, starting with a message send node as its root:

  -> JavaRcvr
  -> JavaRcvr
  -> CheckIsSOMObject
        \-> UninitializedJavaRcvr
  -> SOMRcvr
  -> SOMRcvr
  -> UninitializedSOMRcvr

This represents a dispatch chain for a message send site that has seen four different receivers, two primitive types, and two Smalltalk types. This could be the case for instance for the polymorphic + message.

The main idea was to split the chain in two parts so that I avoid checking for the SOM object more than once, and then can just cast the receiver to SObjectWithClass in the second part of the chain to be able to read the Smalltalk class from it.

Now it turns out, this is not the best idea. The main problem is that SObjectWithClass is not a leaf class in my SOMns hierarchy (this is the case in TruffleSOM though, where it originates). This means, at runtime, the check, i.e., the guard for SObjectWithClass can be expensive. When I looked at the compilation in IGV, I saw many instanceof checks that could not be removed and resulted in runtime traversal of the class hierarchy, to confirm that a specific concrete class was indeed a subclass of SObjectWithClass.

In order to avoid these expensive checks, I refactored the dispatch nodes to extract the guard into its own node that does only the minimal amount of work for each specific case. And it only ever checks for the specific leaf class of my hierarchy that is expected for a specific receiver.

This also means, the new dispatch chain is not separated in parts anymore as it was before. Instead, the nodes are simply added in the order in which the different receiver types are observed over time.

Overall the performance impact is rather large. I saw on the Richards benchmark a gain of 10% and on DeltaBlue about 20%. Unfortunately my refactoring also changed a few other details beside the changes related to instanceof and casts. It also made the guards for objects with fields depend on the object layout instead of the class, which avoids having multiple guards for essentially the same constraint further down the road.

So, the main take-away here is that the choice of guard types can have a major performance impact. I also had a couple of other @Specialization nodes that were using non-leaf classes. For instance like this: @Specialization public Object doSOMObject(SObjectWithClass rcvr) {...}

This looks inconspicuous at first, but fixing those and a few other things resulted in overall runtime reduction on multiple benchmarks between 20% and 30%.

A good way to find these issues is to see in IGV that instanceof or checked cast snippets are inlined and not completely removed. Often they are already visible in the list of phases when the snippets are resolved. Another way to identify them is the use of the Graal option -Dgraal.option.TraceTrufflePerformanceWarnings=true (I guess that would be -G:+TraceTrufflePerformanceWarnings when mx is used). The output names the specific non-leaf node checks that have been found in the graph. Not all of them are critical, because they can be removed by later phases. To check that, you can use the id of the node from the output and search for it in the corresponding IGV graph using for instance id=3235 in the search field.

JIT Data Structures, Fully Reflective VMs, and Meta-Circular Meta-Tracing

The year leading up to SPLASH has been pretty busy. Beside my own talks on Tracing vs. Partial Evaluation and Optimizing Communicating Event-Loop Languages with Truffle, there are going to be three other presentations on work I was involved in.

Just-In-Time Data Structures

On Thursday, Mattias De Wael is going to present his work on Just-in-Time Data Structures at Onward!. For me this work is exciting, because I hope that this could make programming with standard data structures quite a bit simpler in the long run. Today, ‘scripting’ languages like Ruby and Python provide a small set of very versatile data structures they call lists or arrays that can be used for many many different use cases. But instead of having a one-size-fits-all implementation, a runtime system can ideally identify a specific ‘strategy’ to be used dynamically, and even change it over the course of a program. So, in a nice language, we would merely declare a few fundamental properties such as being ordered, and how to compare items, and the runtime system will figure out whether to use a hash, tree, array, or other kind of implementation. Well, that’s at least my vision. Abstract below:

Today, software engineering practices focus on finding the single right data representation (i.e., data structure) for a program. The right data representation, however, might not exist: relying on a single representation of the data for the lifetime of the program can be suboptimal in terms of performance. We explore the idea of developing data structures for which changing the data representation is an intrinsic property. To this end we introduce Just-in-Time Data Structures, which enable representation changes at runtime, based on declarative input from a performance expert programmer. Just-in-Time Data Structures are an attempt to shift the focus from finding the “right’’ data structure to finding the right sequence of data representations. We present JitDS-Java, an extension to the Java language, to develop Just-in-Time Data Structures. Further, we show two example programs that benefit from changing the representation at runtime.

  • Just-In-Time Data Structures. Mattias De Wael, Stefan Marr, Joeri De Koster, Jennifer B. Sartor, and Wolfgang De Meuter. Proceedings of the 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, ACM, (October 2015)

Towards Fully Reflective Environments

On Friday, Guido Chari is going to present his work Towards Fully Reflective Environments also at Onward!. This is a first step in the direction of bringing all parts of a runtime system to life. While classic systems such as Smalltalk and CLOS go pretty far in reifying language elements and enabling us to interact with them on a meta-level, they typically exclude important aspects of the runtime system such as GC, object representation, interpreter semantics, and JIT compilation. Some of these aspects can be observed via debugging interfaces, but what we have in mind here is to have a complete meta-circular runtime system where each and every aspect is reified and can be adapted at runtime. What Guido is going to present is only a first step, especially since performance is not yet addressed. But, we are working on it.

Modern development environments promote live programming (LP) mechanisms because it enhances the development experience by providing instantaneous feedback and interaction with live objects. LP is typically supported with advanced reflective techniques within dynamic languages. These languages run on top of Virtual Machines (VMs) that are built in a static manner so that most of their components are bound at compile time. As a consequence, VM developers are forced to work using the traditional edit-compile-run cycle, even when they are designing LP-supporting environments. In this paper we explore the idea of bringing LP techniques to the VM domain for improving their observability, evolution and adaptability at run-time. We define the notion of fully reflective execution environments (EEs), systems that provide reflection not only at the application level but also at the level of the VM. We characterize such systems, propose a design, and present Mate v1, a prototypical implementation. Based on our prototype, we analyze the feasibility and applicability of incorporating reflective capabilities into different parts of EEs. Furthermore, the evaluation demonstrates the opportunities such reflective capabilities provide for unanticipated dynamic adaptation scenarios, benefiting thus, a wider range of users.

  • Towards Fully Reflective Environments. Guido Chari, Diego Garbervetsky, Stefan Marr, and Stéphane Ducasse. Proceedings of the 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, ACM, (October 2015)

A Formal Foundation for Trace-Based JIT Compilers

Last but not least, and already on Monday at the WODA workshop, Maarten Vandercammmen is going to present A Formal Foundation for Trace-Based JIT Compilers. The bit that I find most interesting in that work is the meta-circular meta-tracer for a little Scheme, which is hiding in there. The focus of the work changed over time, but initially he was working on identifying the essence of meta-tracing. And he now got a pretty small and manageable framework to experiment with meta-tracing, and even execute multiple levels of interpreters that are all traced through. I’d say that’s a pretty neat foundation for future research, more approachable than a huge practical system such as RPython, but of course, also not without drawbacks. For instance, a proper optimizer is missing so far. But let’s see where this gets us in the future!

Trace-based JIT compilers identify frequently executed program paths at run-time and subsequently record, compile and optimize their execution. In order to improve the performance of the generated machine instructions, JIT compilers heavily rely on dynamic analysis of the code. Existing work treats the components of a JIT compiler as a monolithic whole, tied to particular execution semantics. We propose a formal framework that facilitates the design and implementation of a tracing JIT compiler and its accompanying dynamic analyses by decoupling the tracing, optimization, and interpretation processes. This results in a framework that is more configurable and extensible than existing formal tracing models. We formalize the tracer and interpreter as two abstract state machines that communicate through a minimal, well-defined interface. Developing a tracing JIT compiler becomes possible for arbitrary interpreters that implement this interface. The abstract machines also provide the necessary hooks to plug in custom analyses and optimizations.

  • A Formal Foundation for Trace-Based JIT Compilers Maarten Vandercammen, Jens Nicolay, Stefan Marr, Joeri De Koster, Theo D’Hondt, and Coen De Roover. Proceedings of the 13th International Workshop on Dynamic Analysis. Pittsburgh, PA, USA, ACM, (October 2015)