The main question for our paper Are we there yet? Simple Language-Implementation Techniques for the 21st Century was whether these simple AST (TruffleSOM) or bytecode (RPySOM) interpreter implementations based on Truffle and RPython could reach performance of the same order of magnitude as for instance Java on top of a highly optimizing JVM.
The benchmarking methodology is kept as simple as possible. For each benchmark, we roughly determined when the VM reaches stable state and then executed it 100 times to account for non-deterministic influences such as caches and garbage collection. Each benchmark executes with a predefined problem size on each VM and the results are compared directly with each other.
The used benchmark machine has two quad-core Intel Xeons E5520, 2.26 GHz with 8 GB of memory and runs Ubuntu Linux with kernel 3.11. For a few more details of the machine, see the specification. The experimental setup and how to recreate it is briefly described in the general README.
RPySOM and TruffleSOM reach more or less the goal, i.e., the same order of magnitude of performance as the Java Virtual Machine. RPySOM reaches a runtime of a factor 1.7 to 10.6, while TruffleSOM has more remaining optimization potential with being a factor 1.4 to 16 slower, while being faster than RPySOM on two of the three benchmarks. This performance is reached without requiring custom VMs and hundreds of person years of engineering. Thus, we conclude, that RPython as well as Truffle live up to the expectations.
A look at the details for the three benchmarks shows that the performance varies widely depending on the benchmark, which indicates further optimization potential for specific code characteristics.
As a further reference point, we did measurements also for the SOM++ interpreter, a SOM implemented in C++. It uses bytecodes and applies optimizations such as inline caching, threaded interpretation, and a generational garbage collector. However, it is 73 to 710x slower than Java. Since it is significantly slower, we did not include it in the charts because it would distort the impression. It is however included in the table below.
|Runtime in ms|
In addition to DeltaBlue, Richards, and Mandelbrot, which can be considered macro or kernel benchmarks, we also use a number of more focused microbenchmarks to assess the performance of the SOM implementations. Since these are specific to SOM, we do not have numbers for the other languages.
|Runtime in ms|