SSW@ECOOP'26: On Debugging, Benchmarking and (Meta-)Compilation

At this year’s ECOOP, the Institute for System Software will be attending with the almost complete team and we’re going to present on a variety of topics. Come and talk to us at ICOOOLPS, MPLR, the Demo Track, DEBT, ECOOP Academy, and at the poster session!

Below, a brief overview and when the talks are scheduled.

Monday, June 29, 2pm: AOT Meta-Compilation of Dynamic Languages

Towards Ahead-of-Time Meta-Compilation of Dynamic Languages With an Extensible Type Analysis

Christoph A. is going to present initial ideas of how to approach ahead-of-time meta-compilation for dynamic languages. While some dynamic languages can already be compiled fairly successfully ahead of time, we would love to get the compiler for free, ideally from not much more than having to implement the interpreter.

Preprint of the ICOOOLPS position paper.

Full Abstract

Dynamically-typed languages rely on just-in-time (JIT) compilation for execution performance. Meta-compilation systems such as GraalVM's Truffle language implementation framework have reduced the effort needed of enabling JIT compilation to implementing an interpreter. But dynamic languages are increasingly used in scenarios where ahead-of-time (AOT) compilation would be preferable, for instance, for faster startup or to avoid the memory cost of JIT compilation. Therefore, we plan to extend meta-compilation systems to also support AOT compilation.

For successful AOT compilation of dynamically-typed languages, we need an extensive and robust type analysis. In this position paper, we present first ideas for a framework with an extensible core analysis that will enable us to extract type flow semantics from an interpreter implemented in a meta-compilation system.

To achieve the precision needed for fast machine code, we will need to include heuristic analyses. For this, we envision a plugin system that allows us to integrate various different heuristics into a singular unified analysis. Combining analyses in this way can produce results that are better than the sum of their parts.

While this is a very ambitious goal, given the complexity of compiling dynamic languages, we believe we can achieve better-than-interpreted performance for programs with reasonable behavior. Furthermore, to support the full language semantics we keep a general interpreter as a fallback.

Monday, June 29, 3pm: Pedagogical Annotations in a Debugger

Towards Guided Omniscient Debugging in Education using Pedagogical Execution Traces

Markus is going to present work on teaching programming by using debugging techniques. Specifically, he will look at enriching program visualizations with explanations and interactive questions.

Preprint of the DEBT paper.

Full Abstract

Educators frequently use trace-based debuggers for live classroom demonstrations. Yet, if a student’s attention drops during class, they have to fall back to watching recordings (providing a passive, non-interactive experience) or replaying the debugging session at home (lacking the instructor’s pedagogical context and verbal explanations). We introduce Pedagogical Execution Traces (PETs), a concept that enriches execution traces with explanations, highlights and interactive questions. In this work-in-progress idea paper, we present the conceptual foundation of PETs as interactive learning artifacts, showing their applicability within JavaWiz, an educational trace-based graphical debugger. We explore PET authoring design goals and outline ongoing work regarding collaborative debugging scenarios and leveraging Large Language Models (LLMs) for trace annotation.

Tuesday, June 30, 4pm: Supporting Different GCs in AOT-compiled Binaries

A Unifying Approach to Supporting Multiple Garbage Collectors in AOT-compiled Binaries

Thomas will present a fairly simple but effective approach to enable a single AOT-compiled binary for a Java program to use different GCs. At the moment it supports HotSpot's G1 and a more basic generational GC.

Preprint of the MPLR paper.

Full Abstract

Some language implementations combine garbage collection with ahead-of-time compilation to produce self-contained executables for managed-language programs. In these systems, one can typically choose a garbage collector (GC) only at build time. To use another GC, e.g., for better performance,one needs to build another executable.

In this paper, we present an approach for supporting multiple GCs in the same self-contained executable using unified barriers, object layout, object header, and dynamic dispatch. This enables developers to select a GC at run time. Additionally, isolates, i.e., lightweight virtual machine instances with separate collected heaps but within the same process, can now use different GCs alongside each other.

We evaluate our approach in GraalVM Native Image, supporting the Garbage First (G1) and the Serial GC in the same executable. Our evaluation on the DaCapo Chopin and Renaissance benchmarks shows that G1 has on average no performance change (min. −9 %, max. 14 %). Serial GC shows a peak performance regression of 11 % (min. −10 %, max. 33 %). We believe the simplicity of the approach and that one can now choose the GC at run time and on a per isolate basis make this overhead acceptable.

Tuesday, June 30, 5pm: Reducing Binary Size with Static Heuristics

To Compile or Not To Compile: Evaluating Static Heuristics to Reduce Binary Size of Hybrid Execution Systems

Christoph P. will present his evaluation of how far one can get with basic static compiler heuristics, when it comes to reducing the size of AOT-compiled Java binaries, while minimizing the impact on performance.

Preprint of the MPLR paper.

Full Abstract

To compile, or not to compile, that is the question: When ’tis nobler to optimize for performance. Modern compilers have many different optimizations and optimization goals. A common one is to balance between peak performance and startup time. A new ahead-of-time compiled native executable that embeds a managed runtime tries to offer both, while solidifying the notion that everything should be compiled. However, the cost of an enlarged binary size raises the question whether it is beneficial to compile everything.

In this paper, we evaluate static heuristics from classical AOT compilers as well as other techniques based on our own observations. Our goal is to identify heuristics that work in a compilation-first environment and that allow us to reduce binary size while maintaining peak performance.

We compare the different policies in a closed-world hybrid execution system for Java, based on GraalVM Native Image, on a set of 5 DaCapo and 13 Renaissance benchmarks. We find that with the best combination of heuristics we can reduce binary size by 20% while slowing down average performance by only 4%, but avoiding the need for any run-time feedback or complex machine-learning-based approaches. The most promising combination for production use combines heuristics based on early returns, estimated CPU cycles, number of parameters, and whether a method is a static initializer.

Date TBC: A Debugger for Teaching Threads and Locks

JavaWiz ThreadViz - A Visual Debugger for Multi-threaded Programs Based on the Espresso Java VM

Melissa is going to present a visual debugger designed for teaching threads and locks in Java. Threads, locks, and their interaction can feel hard to explain, though, with the right representation in a debugging tool, their dynamic interactions can become more understandable.

Preprint of the Demo paper.

Full Abstract

Programming novices often face difficulties understanding how multi-threading works. Visual debuggers such as JavaWiz can support beginners by providing dynamic visualizations of a program’s behavior, however, they usually only work for single-threaded programs. This paper presents ThreadViz, an extension of JavaWiz to support visualizing multi-threaded Java programs.

In ThreadViz, thread information is collected for visualization by using the Truffle Debug API. Instead of real concurrency, threads are executed stepwise, allowing the user to determine the order of execution and preventing any unpredictable behavior. In the user interface, a unique color is associated with each thread to illustrate the effects of different synchronization mechanisms such as locking and indicate thread state changes. To conclude, application examples are presented to highlight the tool's capabilities.

Friday, July 3, 11am: A Lecture on Benchmarking

Benchmarking on Modern Hardware: Techniques for Performance Comparisons from Day-To-Day Experimenting to Paper Writing

Last but not least, I'll give a lecture on benchmarking. Modern hard- and software makes that quite a bit more complicated than what we would like it to be and I will show a bit how we approach it in practice.

Full Abstract

Modern systems are great! In many ways, they adapt to our software, and optimize it, despite us not really knowing what we are doing, and to a degree that would have been considered magic just a few decades ago.

Though, once we develop our own research ideas on top of these systems and want to make any argument about performance, all this “magic” makes it hard to understand what measurements mean. Worse yet, making sensible performance claims means we have to understand a good chunk of it. Is this benchmark 20% faster because of what I did, or did the CPU increase the clock frequency for the new but not for the old code? Did the JVM just trigger garbage collection? Did the just-in-time compiler slow down my code? What do you mean, “efficiency core”?

In this lecture, we will have a brief look at why benchmarking on modern systems is hard and what can go wrong. Then we will discuss a range of different research scenarios to get a better feeling of what we may need for our work. Since much of this work may involve gradually building up our own systems, we will also look at what it takes to build them based on reliable feedback.

In the second part, we will look at how we can turn the often chaotic scientific process, with all its trials and errors, into a “scientific engineering process” that enables us to try and try again. I’ll suggest a process that allows us to use the same setup that we use for developing our system to not just understand its performance, but also use it to run the experiments we may want for a scientific paper. I’ll demonstrate how to go from daily pull requests with continuous performance tracking to generating plots and statistics for direct inclusion in LaTeX.

Stefan-Marr.de

Stefan Marr

Full Professor (Universitätsprofessor)

SSW@ECOOP'26: On Debugging, Benchmarking and (Meta-)Compilation

Monday, June 29, 2pm: AOT Meta-Compilation of Dynamic Languages

Towards Ahead-of-Time Meta-Compilation of Dynamic Languages With an Extensible Type Analysis

Monday, June 29, 3pm: Pedagogical Annotations in a Debugger

Towards Guided Omniscient Debugging in Education using Pedagogical Execution Traces

Tuesday, June 30, 4pm: Supporting Different GCs in AOT-compiled Binaries

A Unifying Approach to Supporting Multiple Garbage Collectors in AOT-compiled Binaries

Tuesday, June 30, 5pm: Reducing Binary Size with Static Heuristics

To Compile or Not To Compile: Evaluating Static Heuristics to Reduce Binary Size of Hybrid Execution Systems

Date TBC: A Debugger for Teaching Threads and Locks

JavaWiz ThreadViz - A Visual Debugger for Multi-threaded Programs Based on the Espresso Java VM

Friday, July 3, 11am: A Lecture on Benchmarking

Benchmarking on Modern Hardware: Techniques for Performance Comparisons from Day-To-Day Experimenting to Paper Writing