Today, I got a few more benchmarks running to get a better idea of where RTruffleSOM and TruffleSOM stand in terms of their absolute performance.

Measuring performance is always a tedious exercise, especially when trying to compare something to established systems.

So, what do we compare a simple Smalltalk like SOM to? Let’s go with Java and HotSpot as being widely regarded as ‘the’ state-of-the-art system, when it comes to dynamic compilation. Of course this means cross-language benchmarking. To minimize the apples and oranges aspect of such an undertaking, I transcribed most of the SOM benchmarks we got to Java 8. That means, I am using lambdas if that seems a good and somewhat idiomatic choice. Counting loops are just plain for loops of course, but iterations over collections can be nicely expressed with Java’s lambdas. The first interesting insight was that the old Java version of two of the larger benchmarks, namely Richards and DeltaBlue actually got a little faster. Even so, the new versions use getters/setters and lambdas everywhere, performance did not drop. I have to admit, I kind of hoped that HotSpot is going to struggle just a little bit, to make the SOM look a little better. But no, unfortunately, ‘good OO code’ seems to be somethings that HotSpot can appreciate.

Enough introduction, let’s look at the numbers:

Performance results for SOM, normalized the the results for Java 8 with HotSpot in server mode. Lower is better.

RTruffleSOM is the SOM version based on RPython with a meta-tracing just-in-time compiler. On average, RTruffleSOM is 4.7x slower than Java8 (min. 1.8x, max. 10.7x). So, there is still quite a bit of room for improvement. TruffleSOM is SOM implemented using Oracle Lab’s Truffle framework running on top of a JVM with the Graal JIT compiler. It is about 2.7x slower than Java8 (min. 3%, max. 4.7x). The Mandelbrot benchmark reaches Java’s performance, while some of the others are still quite a bit slower. Overall however, for a ‘simple’ interpreter, I think the 3x-slower-than-Java range is pretty good.

Of course, this is the performance after warmup and over multiple iterations. The plot is a box plot, and as you can see, the results are reasonably stable after warmup. Unfortunately, there are applications out there that might not run hot code all the time. So, one more question would be, how good are the interpreters?

So, let’s compare to the Java interpreter, which is used by giving Java the -Xint command-line option:

Performance results for SOM, normalized the the results for Java 8 with HotSpot in interpreter mode. Lower is better.

The results here are not comparable to the pervious results. For the moment, the benchmarks use different parameters to avoid too long runtimes.

For RTruffleSOM, we see a 6x slowdown compared to the Java8 interpreter (min. 1.7x, max. 15.8x). The TruffleSOM interpreter is slightly faster, showing only a slowdown of 5.6x (min 1.9x, max. 13.5x). However, we run TruffleSOM on top of a JVM, so it still benefits from HotSpot’s just-in-time compiler. I also need to point out that both SOMs are run without their respective JIT-compilation frameworks built in. For RTruffleSOM, this means we use a binary without meta-tracing support, and TruffleSOM runs on top of HotSpot without Graal. This means, these numbers are best-case interpreter numbers. Especially TruffleSOM is much slower in the interpreter mode on top of Graal since it records extensive profiling information to enable good JIT compilation, which leads to multiple factors slowdown in the worst case.

Overall, I am pretty happy with the performances of the little SOM interpreters. RTruffleSOM is roughly 5k lines of code and TruffleSOM about 10k LOC and still, both reach the same order of magnitude of performance as Java.