Appendix: Performance Evaluation of RPySOM and TruffleSOM

The main question for our paper Are we there yet? Simple Language-Implementation Techniques for the 21st Century was whether these simple AST (TruffleSOM) or bytecode (RPySOM) interpreter implementations based on Truffle and RPython could reach performance of the same order of magnitude as for instance Java on top of a highly optimizing JVM.

In order to answer this question, we chose DeltaBlue and Richards as object-oriented benchmarks and Mandelbrot as a numerical one. All three are classic benchmarks that have been used to tune for instance JVMs, JavaScript VMs, PyPy, as well as Smalltalk VMs. Still, they are rather specific benchmarks that can given an indicate of what the expected performance of a VM is, but they are not predictors for the performance of concrete applications.

Methodology

The benchmarking methodology is kept as simple as possible. For each benchmark, we roughly determined when the VM reaches stable state and then executed it 100 times to account for non-deterministic influences such as caches and garbage collection. Each benchmark executes with a predefined problem size on each VM and the results are compared directly with each other.

The used benchmark machine has two quad-core Intel Xeons E5520, 2.26 GHz with 8 GB of memory and runs Ubuntu Linux with kernel 3.11. For a few more details of the machine, see the specification. The experimental setup and how to recreate it is briefly described in the general README.

Overall Performance

RPySOM and TruffleSOM reach more or less the goal, i.e., the same order of magnitude of performance as the Java Virtual Machine. RPySOM reaches a runtime of a factor 1.7 to 10.6, while TruffleSOM has more remaining optimization potential with being a factor 1.4 to 16 slower, while being faster than RPySOM on two of the three benchmarks. This performance is reached without requiring custom VMs and hundreds of person years of engineering. Thus, we conclude, that RPython as well as Truffle live up to the expectations.

plot of chunk unnamed-chunk-3

A look at the details for the three benchmarks shows that the performance varies widely depending on the benchmark, which indicates further optimization potential for specific code characteristics.

plot of chunk unnamed-chunk-4

As a further reference point, we did measurements also for the SOM++ interpreter, a SOM implemented in C++. It uses bytecodes and applies optimizations such as inline caching, threaded interpretation, and a generational garbage collector. However, it is 73 to 710x slower than Java. Since it is significantly slower, we did not include it in the charts because it would distort the impression. It is however included in the table below.

	Runtime in ms
	Java		PyPy		RPySOM		TruffleSOM		SOM++
Benchmark	mean	sd	mean	sd	mean	sd	mean	sd	mean	sd
DeltaBlue	118.49	29.01	605.73	9.89	200.67	0.45	170.73	45.20	8655.70	12.80
Mandelbrot	146.44	0.40	217.20	0.78	786.20	1.07	343.53	2.97	104033.22	184.23
Richards	159.74	10.86	647.71	2.63	1688.53	2.92	2556.76	72.72	87435.83	170.23

Microbenchmarks

In addition to DeltaBlue, Richards, and Mandelbrot, which can be considered macro or kernel benchmarks, we also use a number of more focused microbenchmarks to assess the performance of the SOM implementations. Since these are specific to SOM, we do not have numbers for the other languages.

plot of chunk unnamed-chunk-6

	Runtime in ms
	RPySOM		TruffleSOM
Benchmark	mean	sd	mean	sd
Bounce	51.65	0.21	44.50	9.10
BubbleSort	56.27	0.25	40.27	1.75
Dispatch	32.59	0.19	35.36	2.59
Fannkuch	65.03	0.37	16.06	2.40
Fibonacci	328.13	2.18	95.59	3.00
FieldLoop	190.06	1.15	102.12	3.09
IntegerLoop	137.92	0.57	31.58	1.34
List	154.76	6.13	82.53	6.10
Loop	110.44	0.42	30.37	2.60
Permute	430.10	0.75	83.95	1.89
Queens	110.88	0.34	74.75	29.64
QuickSort	267.03	0.47	61.82	2.04
Recurse	135.94	0.33	69.71	1.89
Sieve	251.01	0.48	48.56	2.76
Storage	368.92	1.04	133.48	2.74
Sum	90.27	0.33	29.92	1.49
Towers	239.46	0.61	78.72	5.74
TreeSort	14512.53	9.05	47.55	2.12
WhileLoop	265.95	0.53	123.18	2.95

Stefan-Marr.de

Stefan Marr

Full Professor (Universitätsprofessor)

Appendix: Performance Evaluation of RPySOM and TruffleSOM

Methodology

Overall Performance

Microbenchmarks