Add Graal JIT Compilation to Your JVM Language in 5 Easy Steps, Step 4

Step 4: Complete Support for Mandelbrot

In the previous post of this series, we built up all the infrastructure to execute a simple Fibonacci function with a Truffle interpreter for Golo. This included an introduction to the basic aspects of Truffle, its support for specializations, the idiomatic ways for realizing sequences, control flow, basic operators, and literals. Together with the discussion of how function invocation can be implemented, we covered most aspects that are required also for our final goal, i.e., to support the execution of the Mandelbrot program from the very first post in the series.

Below, the main part of the Mandelbrot program implemented in Java:

while (y < size) {
  double ci = (2.0 * y / size) - 1.0;
  int x = 0;

  while (x < size) {
    // ...
    double cr = (2.0 * x / size) - 1.5;

    int z = 0;
    int escape = 1;

    while (z < 50) {
      zr = zrzr - zizi + cr;
      zi = 2.0 * zr * zi + ci;

      zrzr = zr*zr;
      zizi = zi*zi;

      if (zrzr + zizi > 4.0) {
        escape = 0;
        break;
      }
      z += 1;
    }

    byte_acc = (byte_acc << 1) | escape;
    bit_num += 1;
// remainder left out for brevity

Since much of what we needed to support Mandelbrot are extensions, or based on the same principles that we discussed before, this post will be much shorter and only sketch the changes to existing elements. However, we discuss in greater detail how to add local variables, loops, and how to call Java functions based on method handles.

Extensions for Longs, Doubles, Strings, and Basic Operations

In a first step, we extend the Types class and the ExpressionNode to support long and double primitives as well as Strings for specialization. The Types class looks now like this:

@TypeSystem({
  int.class,
  boolean.class,
  long.class,    // added for Mandelbrot
  double.class,  // added for Mandelbrot
  String.class,  // added for Mandelbrot
  Object[].class
})
class Types { }

In the ExpressionNode, we merely add the missing executeLong(.), executeDouble(.), and executeString(.) methods in the same way as the existing ones. This will then allow a node to speculate on the child nodes always evaluating, for instance, to a primitive long value.

As we can see in the Mandelbrot program above, we also need support for literals. Specifically, we are going to add nodes for literal long, double, and string values. Furthermore, we add nodes for the true, false, and null literals. With the IntegerLiteralNode from the last post, we already saw the general idea. For true, false, and null it is even simpler, we merely return the fixed value on execution. So, it looks like this:

class NullLiteralNode extends LiteralNode {

  public Object executeGeneric(VirtualFrame frame) {
    return null;
  }
}

Since Mandelbrot is mostly about computation, we also need a range of additions and extensions to support multiplication, division, addition, subtraction, less-than, equal as well as not-equal. Furthermore, we need to implement the Truffle versions of the xor, or, and left shift bit operators.

Below find a sketch of the addition as implemented by the PlusNode. The sketch shows the support for doubles and strings. Note that is also has a doStringAndDouble(.,.) method. This is because the child nodes have of course different semantics so that we need to specify the desired behavior for all combinations as well.

abstract class PlusNode extends BinaryNode {

  @Specialization
  public String doStrings(String left, String right) {
    return left + right;
  }

  @Specialization
  public double doDoubles(double left, double right) {
    return left + right;
  }

  @Specialization
  public String doStringAndDouble(String left, double right) {
    return left + right;
  }
  // ... and a few other combinations, incl. int and long
}

The bit operations are only defined for integers and longs, so the resulting BitXorNode looks like this:

abstract class BitXorNode extends BinaryNode {
  @Specialization
  public long doLongs(long left, long right) {
    return left ^ right;
  }

  @Specialization
  public int doIntegers(int left, int right) {
    return left ^ right;
  }
}

Unary Operations: Type Coercions and the not Operator

Next on our list are unary operators for value casting. Specifically, we need to implement intValue(.) and doubleValue(.). You might remember from the last post that we handled the println(.) method specially. It is a method defined on the Golo's Predefined class that takes only one parameter, too. Our solution the last time was to create a PrintlnNode during the lookup and directly replace the UninitializedInvocationNode AST node with this node. During execution, this has the major advantage that there is no overhead for function calls or even argument wrapping. To properly support Mandelbrot in Golo, we need explicit coercions for integer and double values. Of course, these are simple operations that should not have any overhead. Thus, we use the same approach.

The nodes are defined as follows:

@NodeChild(value = "expr", type = ExpressionNode.class)
abstract class UnaryNode extends ExpressionNode { }

abstract class IntValue extends UnaryNode implements PreEvaluated {

  abstract Object executeEvaluated(VirtualFrame frame, Object value);

  @Override
  public Object doEvaluated(VirtualFrame frame, Object[] args) {
    return executeEvaluated(frame, args[0]);
  }

  @Specialization
  public int doDouble(double value) {
    return (int) value;
  }
}

On the first look, this seems to be more complicated than necessary. The reason that we implement the PreEvaluated interface is because of the way node specialization should work. Normally, a specialize(.) method will replace the old node in the tree and return the new node so that it can be executed directly (see previous post). However, it needs to take the already evaluated arguments. For this purpose, we implement the interface and the corresponding doEvaluated(.) method. The implementation for the abstract executeEvaluated(.) will be generated by the TruffleDSL and we do not have to do that manually. The main functionality of the node is realized with the specialization doDouble(.), which merely casts the double value to an integer value. For the doubleValue(.) operation, the implementation is essentially the same.

The not operator is handled differently in Golo. Instead of being a function invocation, it is a keyword. We will skip over how to add the necessary support to the Golo IR to Truffle visitor. In the end, the executing AST node is simply this:

abstract class NotNode extends UnaryNode {
  @Specialization
  public boolean doBoolean(boolean value) {
    return !value;
  }
}

Local Variables

One issue we have kept putting off so far is to support the notion of local variables. For our Fibonacci function, we merely needed support for accessing arguments (see previous post, sec. 4.2). This was realized by accessing the array that is passed on function invocation and is stored in the frame object given to the execute*(.) methods. Now, we will use these frames to also store local variables.

As a little bit of background, frames are the activation records of functions. Thus, they keep the temporary state that is needed during the execution of a function. This includes access to the actual arguments, local variables, and possibly other execution state.

Truffle distinguishes two types of frames, virtual and materialized frames. The first type is called virtual to indicate that the optimizer will not allocate an object for the frame at runtime, instead, it will use the frame to figure out the data dependencies between operations within a compilation unit, i.e., a function and possibly additionally inlined functions. To make this reliable the usage of virtual frames is restricted. For instance, they are not supposed to be assigned to fields of other objects and cannot be passed to methods of objects where Graal cannot determine the concrete method at compilation time. Generally, virtual frames cannot escape the compilation unit, because this would mean that they need to be represented as a proper object. Materialized frames on the other hand can be used as normal objects. Graal does not impose the same restrictions on them, but instead, they come with a runtime cost. Materialized frames are useful to implement features like closures.

In addition to giving the compiler the means to determine data dependencies, frames are also meant to help it with determining the concrete type information. For both reasons, frames come with a FrameDescriptor. On the one hand, it maintains the structural information about the slots of a frame, and on the other hand, the frame slots can record type information. For now, we will focus on the structural elements, and will ignore the type information.

What does this mean concretely for our interpreter? For our Function objects it means, we need to give them proper FrameDescriptors so that the structure of the frame is known. We do that in our visitFunction(.) method:

public Function visitFunction(GoloFunction function) {
  FrameDescriptor frameDesc = new FrameDescriptor();
  context.frameDescriptors.push(frameDesc);

  ExpressionNode body = function.getBlock().accept(this);

  context.frameDescriptors.pop();
  return new Function(body, function, frameDesc);
}

While it is not really necessary to support Mandelbrot, we have a stack of frameDescriptors in the visitor, because Golo supports nested functions. And with this stack, we can model the lexical scoping of them correctly.

To access variables, we need to add support for it to the visitReferenceLookup(.) method, next to the support for reading arguments:

public ExpressionNode visitReferenceLookup(ReferenceLookup referenceLookup) {
  LocalReference reference = referenceLookup.
          resolveIn(context.referenceTableStack.peek());
  if (reference.isArgument()) {
    return new LocalArgumentReadNode(reference.getIndex());
  } else {
    FrameSlot slot = getFrameSlot(reference);
    return LocalVariableReadNodeGen.create(slot);
  }
}

FrameSlot getFrameSlot(LocalReference reference) {
  return context.frameDescriptors.peek().
          findOrAddFrameSlot(reference.getName());
}

When we see that a reference is not an argument, we create a LocalVariableReadNode. This read node gets a FrameSlot object that is created by the frame descriptor based on the name of the variable. Since we do not care about type information for the moment, a frame slot is merely Truffle's handle to represent the variable.

Assignment statements are transformed similarly:

public ExpressionNode visitAssignmentStatement(
    AssignmentStatement assignment) {
  LocalReference reference = assignment.getLocalReference();
  FrameSlot slot = getFrameSlot(reference);
  return LocalVariableWriteNodeGen.create(
      slot, (ExpressionNode) assignment.
                          getExpressionStatement().accept(this));
}

For them, we create a LocalVariableWriteNode. But of course, a write node still needs the expression that computes the value it needs to write. So, beside the slot, it also gets the transformed subexpression of the assignment statement.

For the implementation of the read node, we got the following code:

abstract class LocalVariableReadNode extends ExpressionNode {

  protected final FrameSlot slot;

  public LocalVariableReadNode(FrameSlot slot) {
    this.slot = slot;
  }

  @Specialization
  public Object doObject(VirtualFrame frame) {
    return frame.getValue(slot);
  }
}

The node is structured as usual. It takes the slot object as argument and stores it in a final field so that the compiler knows it can rely on it as a constant. The specialization does merely take the slot object to read a value from the frame. There could be more specialization to read different primitive types from the frame, but for the moment this is not necessary.

The write node looks more or less the same:

@NodeChild(value = "expr", type = ExpressionNode.class)
abstract class LocalVariableWriteNode extends ExpressionNode {

  protected final FrameSlot slot;

  public LocalVariableWriteNode(FrameSlot slot) {
    this.slot = slot;
  }

  @Specialization
  public Object writeGeneric(VirtualFrame frame, Object exprValue) {
    slot.setKind(FrameSlotKind.Object);
    frame.setObject(slot, exprValue);
    return exprValue;
  }
}

Here we need to have the expression for the value that is to be written. So, it is declared with an annotation. In the specialization, we get it as the expValue argument. Note, when executing writeGeneric(.), we first ensure that the slot kind has been set to Object. The optimizer is able to remove this, because it only reaches frame slots that have been initialized. Finally, we set the expValue on the frame.

And that's all. With these nodes we support read and write operations for local variables.

Loops

The final element we need for the Mandelbrot function is iteration, or loops. Golo got multiple types of loops including for and while loops. Fortunately for us, Golo's IR already takes care of desugaring these constructs to a single loop construct with explicit initialization, condition, body, and post-condition.

The Main Loop Node

We translate this construct to the following node implementation:

class ForLoopNode extends ExpressionNode {

  @Child protected ExpressionNode init;
  @Child protected LoopNode loopNode;

  public ForLoopNode(ExpressionNode init,
      ExpressionNode condition,
      ExpressionNode body,
      ExpressionNode post) {
    this.init = init;
    loopNode = Truffle.getRuntime().createLoopNode(
        new RepeatNode(condition, body, post));
  }

  @Override
  public Object executeGeneric(VirtualFrame frame) {
    if (init != null) {
      init.executeGeneric(frame);
    }
    try {
      loopNode.executeLoop(frame);
    } catch (BreakLoopException e) { /* just left the loop */ }
    return null;
  }
}

Our ForLoopNode got two @Child nodes. The first one is the init expression. In a for loop like in Java or C, it typically initializes an induction variable, e.g., i, with an initial value. The second child node is Truffle's LoopNode, which realizes the iteration with support for on-stack-replacement. More on that in a bit.

The executeGeneric(.) first executes the init expression if available, and afterwards the loop. In case the loop contained a break keyword, we would throw a control-flow exception that is caught here.

Using Truffle's LoopNode

The actual iteration happens in the LoopNode. For that, it requires an implementation of the RepeatingNode interface. The interface has a single method which does a loop iteration and indicates whether it should continue. As mentioned before, this abstraction is ment to enable on-stack-replacement. This is a useful feature for long-running methods that contain loops. Normally, the execution would only reach a newly compiled native code version of a method when it is entered again. With on-stack-replacement however, the VM can detect methods with long-running loops and switch to the optimized native code even while executing the loop.

Our implementation of the RepeatingNode interface looks as follows:

class RepeatNode extends Node implements RepeatingNode {

  @Child protected ExpressionNode condition;
  @Child protected ExpressionNode body;
  @Child protected ExpressionNode post;

  public RepeatNode(ExpressionNode condition,
      ExpressionNode body, ExpressionNode post) {
    this.condition = condition;
    this.body = body;
    this.post = post;
  }

  @Override
  public boolean executeRepeating(VirtualFrame frame) {
    if (shouldExecute(frame)) {
      body.executeGeneric(frame);
      if (post != null) {
        post.executeGeneric(frame);
      }
      return true;
    } else {
      return false;
    }
  }

  private boolean shouldExecute(VirtualFrame frame) {
    try {
      return condition.executeBoolean(frame);
    } catch (UnexpectedResultException e) {
      CompilerDirectives.transferToInterpreter();
      throw new UnsupportedSpecializationException(
          this, new Node[]{condition}, e.getResult());
    }
  }
}

It got child nodes for the loop condition, the loop body, and a post expression that is evaluated after each iteration. The method shouldExecute evaluates the loop condition. But since Golo is a dynamically typed language, we can't be completely sure that the condition expression really returns a boolean. Here we handle this case simply be throwing an exception that indicates that this case is not yet supported.

The executeRepeating(.) method represents one iteration. Thus, it first checks whether the loop should continue executing by invoking shouldExecute(.), and if that is the case, it executes the loop's body. After executing the body, it also executes the post expression if it is present. The return value of executeRepeating(.) indicates whether the iteration should be continued. Beside the minor complications, implementing a loop construct is thus pretty simple. If on-stack-replacement is not desired for some reason, one could implement the loop also directly using any Java loop construct.

The break Keyword in Loops

As we have seen earlier, Golo also supports the break keyword to exit from loops. The implementation idea is very similar to handling the return keyword. The AST node for the break keyword is going to throw a control-flow exception. And as we have seen in the ForLoopNode, we handle it with a try {...} catch (BreakLoopException e) around the execution of the loop body. Thus, the break keyword is implemented with the following BreakLoopNode:

class BreakLoopNode extends ExpressionNode {

  @Override
  public Object executeGeneric(VirtualFrame frame) {
    throw new BreakLoopException();
  }
}

The node will simply raise the BreakLoopException which returns execution to the ForLoopNode. After catching the exception, execution simply continues after the loop.

Invoking Arbitrary Java Methods

Golo is designed to be a dynamic language for the JVM ecosystem. Thus, it embraces Java and its ecosystem and Golo programs use Java libraries throughout. So far, we added dedicated nodes for each functionality. Applying the same approach to call arbitrary Java methods would not work. So, we need a way to call Java methods in a more reflective fashion.

Since Golo's lookup mechanism returns for some cases already Java MethodHandles, we added a MethodHandleInvokeNode:

class MethodHandleInvokeNode extends FunctionInvocationNode {
  private final MethodHandle method;

  public MethodHandleInvokeNode(FunctionInvocationNode uninit,
      MethodHandle method) {
    super(uninit.name, uninit.module, uninit.argumentsNode);
    this.method = method;
  }

  @Override
  public Object executeEvaluated(VirtualFrame frame,
      Object[] args) {
    try {
      return method.invokeWithArguments(args);
    } catch (Throwable e) {
      throw new NotYetImplemented();
    }
  }
}

This node becomes part of the function invocation specialization discussed in the previous post. So, if we get a MethodHandle as return, the UninitializedFunctionInvocationNode rewrites itself to this MethodHandleInvokeNode. When it is executed, it simply calls invokeWithArguments(.).

For our benchmark, this is used to call the methods that read the System's time.

Summary

In this post, we discussed how the interpreter can be extended to support the Mandelbrot program. For this, we extended the TypeSystem, added arithmetic operations, cast operators, support for local variables, support for Golo's loop constructs, as well as the ability to call arbitrary Java methods.

The main goal for this series of posts was to improve the numerical performance of Golo by using the Graal just-in-time compiler. So, let's look at the results. When we execute Mandelbrot now on top of Graal using our Truffle interpreter, we get the following results:

Initial Performance of Golo+Truffle

On Java, Mandelbrot takes about 108ms to execute. With Golo's bytecode compilation backend, it takes about 512ms. And, as we can see on the plot above, Golo running with the Truffle-based interpreter on top of Truffle takes about 505ms. This means, our Golo+Graal is about the same speed as the bytecode-based version.

In the next post, we'll investigate how the compilation works, and optimize our interpreter. It should not be much slower than Java, which it unfortunately still is. But more on that next time.

Truffle Concepts, Conventions, and Other Important Bits

As at the end of the last post, here also a brief overview of relevant Truffle concepts:

  • VirtualFrame object (sec. 3) cannot escape a Graal compilation unit, they cannot be assigned to fields of objects, or passed to polymorphic functions.
  • MaterializedFrame objects (sec. 3) can be used as normal objects, but have a runtime overhead.
  • FrameDescriptor (sec. 3) objects define the set of local variables in a method activation in terms of FrameSlots, which can track their runtime types.
  • The LoopNode class (sec. 4.2) enables Graal to perform on-stack-replacement to be able to optimize long-running loops during their execution

Acknowledgements

I'd like to thank Julien Ponge for his comments on drafts of this tutorial and his help with the technical details of Golo.

Creative Commons License
“Add Graal JIT Compilation to Your JVM Language in 5 Easy Steps” by Stefan Marr is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Permissions beyond the scope of this license are available on request.

Tags: , , , , , ,

Comments are closed.