Archive of posts with the tag paper

Oct 15, 2025: Can We Know Whether a Profiler is Accurate?

If you have been following the adventures of our hero over the last couple of years, you might remember that we can’t really trust sampling profilers for Java, and it’s even worse for Java’s instrumentation-based profilers.
Aug 27, 2025: How to Slow Down a Program? And Why it Can Be Useful.

Most research on programming language performance asks a variation of a single question: how can we make some specific program faster? Sometimes we may even investigate how we can use less memory. This means a lot of research focuses solely on reducing the amount of resources needed to achieve some computational goal.
Sep 17, 2024: Instrumentation-based Profiling on JVMs is Broken!

Last year, we looked at how well sampling profilers work on top of the JVM. Unfortunately, they suffer from issues such as safepoint bias and may not correctly attribute observed run time to the correct methods because of the complexities introduced by inlining and other compiler optimizations.
Jun 18, 2024: 5 Reasons Why Box Plots are the Better Default Choice for Visualizing Performance

Box Plots, Or Better!
Sep 20, 2023: Don't Blindly Trust Your Java Profiler!

How do we know on what to focus our attention when trying to optimize the performance of a program? I suspect at least some of us will reach for sampling profilers. They keep the direct impact on the program execution low, and collect stack traces every so often during the program execution. This gives us an approximate view of where a program spends its time. Though, this approximation as it turns out can be surprisingly unreliable.
Jun 6, 2023: Squeezing a Little More Performance Out of Bytecode Interpreters
Nov 8, 2022: How Effective are Classic Lookup Optimizations for Rails Apps?

We know that Ruby and especially Rails applications can be very dynamic and pretty large. Though, many of the optimizations interpreters and even just-in-time compilers use have been invented in the 1980s and 1990s before Ruby and Rails even existed. So, I was wondering: do these optimizations still have a chance of coping with the millions of lines of Ruby code that large Rails apps from Shopify, Stripe, or GitLab have? Unfortunately, we don’t have access to such applications. As the next best thing, we took the largest Ruby benchmarks we could get our hands on, and analyzed those.
Oct 30, 2022: Reducing Memory Footprint by Minimizing Hidden Class Graphs

Tomoharu noticed in his work on the eJSVM, a JavaScript virtual machine for embedded systems, that quite a bit of memory is needed for the data that helps us to represent JavaScript objects efficiently. So, we started to look into how the memory use could be reduced without sacrificing performance.
Oct 6, 2022: Effortless Language Servers

Ever since my blog post in 2016, I have wanted a good language server for SOM and Newspeak. Though, I didn’t really have the time to implement more than a few features.
Jul 12, 2019: What if we could see all concurrency bugs in the debugger?

Multiverse Debugging: Non-Deterministic Debugging for Non-Deterministic Programs
May 27, 2019: Generating an Artifact From a Benchmarking Setup as Part of CI

Disclaimer: The artifact, for which I put this automation together, was rejected. I take this as a reminder that the technical bits still require good documentation to be useful.
Aug 24, 2018: Efficient Deterministic Replay for Actors

Debugging concurrent systems is pretty hard, and we worked already for a while to make things a bit better. However, a big remaining problem is that bugs are not easily reproduced.
Mar 13, 2018: How to Design Collection Libraries?

Programming languages naturally come with a library of containers or collection types. They allow us to easily work with arbitrary number of elements, which is something all major languages care about. Unfortunately, it seems like there is not much writing on how to design such libraries. Even asking a few people that worked for a long time on collection libraries did not yield much of a structured approach to such a central element for our languages. The one major piece of writing we found is the Scala people describing their experience with bit rot and how they redesigned their collection implementation to avoid it.
Oct 15, 2017: Debugging Concurrency Is Hard, but We Can Do Something About It!

When we have to debug applications that use concurrency, perhaps written in Java, all we get from the debugger is a list of threads, perhaps some information about held locks, and the ability to step through each thread separately.
Oct 25, 2016: Cross-Language Compiler Benchmarking: Are We Fast Yet?

Research on programming languages is often more fun when we can use our own languages. However, for research on performance optimizations that can be a trap. In the end, we need to argue that what we did is comparable to state-of-the-art language implementations. Ideally, we are able to show that our own little language is not just a research toy, but that it is, at least performance-wise, competitive with for instance Java or JavaScript VMs.
Feb 11, 2016: Domains: Sharing State in the Communicating Event-Loop Actor Model

It has been a while since we started working on how to extended the Actor model with mechanisms to safely share state. Our workshop paper on Tanks was published in 2013. And now finally, an extended version of this work was accepted for publication. Below you can find the abstract with a few more details on the paper, and of course a preprint of the paper itself.
Jan 25, 2016: Towards Meta-Level Engineering and Tooling for Complex Concurrent Systems

Last December, we got a research project proposal accepted for a collaboration between the Software Languages Lab in Brussels and the Institute for System Software here in Linz. Together, we will be working on tooling for complex concurrent systems. And with that I mean systems that use multiple concurrency models in combination to solve different problems, each with the appropriate abstraction. I have been working on these issues already for a while. Some pointers are available here in an earlier post: Why Is Concurrent Programming Hard? And What Can We Do about It?
Oct 23, 2015: JIT Data Structures, Fully Reflective VMs, and Meta-Circular Meta-Tracing

The year leading up to SPLASH has been pretty busy. Beside my own talks on Tracing vs. Partial Evaluation and Optimizing Communicating Event-Loop Languages with Truffle, there are going to be three other presentations on work I was involved in.
Oct 21, 2015: Optimizing Communicating Event-Loop Languages with Truffle

The past few month, I have been busy implementing a fast actor language for the JVM. The language is essentially Newspeak with a smaller class library and without proving access to the underlying platform, which can lead to violations of the language’s guarantees.
Oct 19, 2015: Tracing vs. Partial Evaluation: Comparing Meta-Compilation Approaches for Self-Optimizing Interpreters

Back in 2013 when looking for a way to show that my ideas on how to support concurrency in VMs are practical, I started to look into meta-compilation techniques. Truffle and RPython are the two most promising systems to build fast language implementations without having to implement a compiler on my own. While these two approaches have many similarities, from a conceptual perspective, they take two different approaches that can be seen as the opposite ends of a spectrum. So, I thought, it might be worthwhile to investigate them a little closer.
Apr 28, 2015: Zero-Overhead Metaprogramming

Runtime metaprogramming and reflection are slow. That’s a common wisdom. Unfortunately. Using refection for instance with Java’s reflection API, its dynamic proxies, Ruby’s #send or #method_missing, PHP’s magic methods such as __call, Python’s __getattr__, C#’s DynamicObjects, or really any metaprogramming abstraction in modern languages unfortunately comes at a price. The fewest language implementations optimize these operations. For instance, on Java’s HotSpot VM, reflective method invocation and dynamic proxies have an overhead of 6-7x compared to direct operations.
Jan 27, 2015: Partitioned Global Address Space Languages

More than a decade ago, programmer productivity was identified as one of the main hurdles for future parallel systems. The so-called Partitioned Global Address Space (PGAS) languages try to improve productivity and explore a range of language design ideas. These PGAS languages are designed for large-scale high-performance parallel programming and provide the notion of a globally shared address space, while exposing the notion of explicit locality on the language level. Even so the main focus is high-performance computing, the language ideas are also relevant for the parallel and concurrent programming world in general.
Jan 27, 2015: Partitioned Global Address Space Languages

More than a decade ago, programmer productivity was identified as one of the main hurdles for future parallel systems. The so-called Partitioned Global Address Space (PGAS) languages try to improve productivity and explore a range of language design ideas. These PGAS languages are designed for large-scale high-performance parallel programming and provide the notion of a globally shared address space, while exposing the notion of explicit locality on the language level. Even so the main focus is high-performance computing, the language ideas are also relevant for the parallel and concurrent programming world in general.
Sep 22, 2014: Are We There Yet? Simple Language-Implementation Techniques for the 21st Century

The first results of my experiments with self-optimizing interpreters was finally published in IEEE Software. It is a brief and very high-level comparison of the Truffle approach with a classic bytecode-based interpreter on top of RPython. If you aren’t familiar with either of these approaches, the article is hopefully a good starting point. The experiments described in it use SOM, a simple Smalltalk.
Sep 17, 2014: Fork/Join Parallelism in the Wild: Documenting Patterns and Anti-Patterns in Java Programs using the Fork/Join Framework

Parallel programming is frequently claimed to be hard and all kind of approaches have been proposed to solve the complexity issues. The Fork/Join programming style introduced with Cilk enables the parallel decomposition of problems in a recursive divide-and-conquer style, and on the surface looks very simple with its minimalistic approach of having a fork and a join language construct. But is it actually simple to use? To find out, Mattias started to dig through the Java open source projects on GitHub and tried to identify common patterns. Next week, he will present our findings at PPPJ’14.
Mar 3, 2014: Domains: Safe Sharing Among Actors

The actor model is a pretty nice abstraction to reason about completely independent entities that interact purely by exchanging messages. However, for software development, some see the pure actor model as too fine-grained and too restrictive exposing many of the low-level issues such as data races again on a higher level again, and thereby forgoing some of its conceptual benefits.
Feb 17, 2014: Towards Composable Concurrency Abstractions

One of the big questions that came up during my PhD was: ok, now you got your fancy ownership-based metaobject protocol, and you can implement actors, agents, communicating sequential processes, software transactional memory, and many others, but now what? How are you going to use all of these in concert in one application? Finding a satisfying answer is unfortunately far from trivial.
Feb 13, 2014: Parallel Gesture Recognition with Soft Real-Time Guarantees

More than three years ago, Lode and I started thinking about parallel event processing for realtime systems. The main use case back then was gesture and motion detection based on cameras such as the Kinect. Thierry created the first fully functional prototype called PARTE, and in addition to his master thesis, we wrote a workshop paper about it. Now, we finally got also the revised and extended version of this paper accepted.
Oct 24, 2013: Tanks: Multiple Reader, Single Writer Actors

And another paper that’s going to be presented by Joeri is our work on Tanks, a variation of communicating event loops (à la E or AmbientTalk). Tanks add synchronous and consistent read access to the event loop model.
Oct 24, 2013: Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors

On Sunday, I am going to present work on a distributed Rete engine I have been involved in over the last year. The presentation will be at the AGERE workshop co-located with SPLASH. Note that most of the work has been done by Janwillem and Thierry over the last two years. They did a great job in first implementing and parallelizing our Rete engine and now distributing it to scale up for “big data” scenarios.
Jan 9, 2013: Parallel Gesture Recognition with Soft Real-Time Guarantees

It has been a while since SPLASH’12, but I got finally around to put up a copy of our paper at the AGERE’12 workshop. It is based on Thierry’s master thesis and presents his work on parallelizing a Rete engine for gesture recognition. Lode and I were his advisors and are happily working with him on what we promised in the future work section.
Mar 9, 2012: Identifying A Unifying Mechanism for the Implementation of Concurrency Abstractions on Multi-Language Virtual Machines

My paper, on how to support various concurrency models, with an ownership-based meta-object protocol (MOP) was accepted at TOOLS’12. Below, you will find a preprint version of the paper. A later post will provide details on how to use it and how to experiment with the MOP in Pharo 1.3.
Jan 24, 2012: Modularity and Conventions for Maintainable Concurrent Language Implementations: A Review of Our Experiences and Practices

Modularity: AOSD’12 will be in Potsdam at the end of March, and I am looking forward especially to the MISS’12 workshop.
Dec 28, 2011: CSOM/PL: A Virtual Machine Product Line

Welcome to Academia. That is how I take this one…
Dec 24, 2011: Synchronization Views for Event-loop Actors

With Joeri we have been working already for a while on a paper to extend the standard actor model with more parallelism. This work is not completed yet, and there are still some theoretical issues with the approach he designed. But we are working on it!
Sep 27, 2011: Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Multicore/Manycore Era?

As preparation for SPLASH’11, here my paper for the VMIL workshop. It is a position paper discussing in which direction virtual machines should evolve in the future with regard to the challenges manycore architectures and concurrent programming bring.
Sep 22, 2010: Highlights of HPCC 2010

The 12th IEEE Internal Conference on High Performance Computing and Communications was not the first conference I attended. However, it was the first one where I actually presented a paper in the main research track.
Jul 30, 2010: Doctoral Symposium at SPLASH 2010

In October, I will give a brief presentation on the state of affairs with my PhD research at the SPLASH 2010 Doctoral Symposium. The basic idea has not changed since my last presentation at the TiC’10 summer school. I haven’t been able to do a lot of real work for it, but the ideas are a bit clearer now. The following two-page proposal will be published as part of the conference proceedings.
Jul 6, 2010: Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fine-grained Parallelism

The last half year was an interesting departure from my actual PhD research. First, I though the idea of barriers and phasers might be interesting to incorporate into a virtual machine as part of my thesis, but as it turned out, they are much to high-level and are better off implemented in a library. The gain for direct support in a VM is just not proportional to the effort and restrictions which come with that step.
Feb 21, 2010: Towards an Actor-based Concurrent Machine Model

Already quite a while ago, I was involved in writing a workshop paper about an actor model for virtual machines. Actually, the main idea was to find a concurrency model for a VM which supports multi-dimensional separation of concerns. However, AOP is not that interesting for me at the moment, so I am focussing on the concurrency, especially the actor-based VM model.
Feb 21, 2010: Towards an Actor-based Concurrent Machine Model

Already quite a while ago, I was involved in writing a workshop paper about an actor model for virtual machines. Actually, the main idea was to find a concurrency model for a VM which supports multi-dimensional separation of concerns. However, AOP is not that interesting for me at the moment, so I am focussing on the concurrency, especially the actor-based VM model.
Feb 14, 2010: Intermediate Language Design of High-level Language Virtual Machines: Towards Comprehensive Concurrency Support

My second workshop paper got published at the ACM Digital Library. This is actually only an abstract, but nonetheless, it might be interesting for people looking into the design of virtual machines and especially bytecodes/intermediate languages.
Feb 7, 2010: Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Finally, my first workshop paper got published, which was a little odyssey with some misunderstandings, but anyway, now it is out. It is just a position paper, thus, do not expect to many insights. However, what it describes is my big plan, and hopefully the story of my PhD. Am working on it…

Stefan-Marr.de

Stefan Marr

Full Professor (Universitätsprofessor)

Archive of posts with the tag paper