Programming Language Implementation: In Theory, We Understand. In Practice, We Wish We Would.

It’s February! This means I have been at the JKU for four months. Four months with teaching Compiler Construction and System Software, lots of new responsibilities (most notably signing off on telephone bills and coffee orders…), many new colleagues, and new things to learn for me, not least because of the very motivated students and PhD students here. And when I say motivated, yes, I am very surprised. While the attendance of my 8:30am Compiler Construction lectures was declining throughout the term as expected, the students absolutely aced their exam. I suspect I will have to make it harder next year. Much harder… hmmm 🤔 Much of the good results can likely be attributed to the very extensive exercise sessions run by my colleagues throughout the semester.

At this point, I have to send a big thank you to everyone from the Institute for System Software, past and present. It’s great to be part of such a team! You made my start very easy, and, well, it now gives me the time to think about my inaugural lecture.

What’s an inaugural lecture?

I have been in academia for almost two decades, but I have to admit, I don’t really remember being at an inaugural lecture. According to Wikipedia, in the Germanic tradition an inaugural lecture (Antrittsvorlesung) is these days something of a celebration. It’s a festive occasion for a new professor to present their field to a wider audience, possibly also presenting their research vision.

At the JKU, it indeed seems to be planned as a festive occasion, too.

On March 9th, 2026, starting at 4pm Prof. Bernhard Aichernig and I will give our Antrittsvorlesungen, and you are cordially invited to attend.

Bernhard will give a talk titled Verification, Falsification, and Learning – a Triptych of Formal Methods for Trustworthy IT Systems.

My own talk is titled, as is this post: Programming Language Implementation: In Theory, We Understand. In Practice, We Wish We Would.

Bernhard will start out by looking at the formal side of things, making the connection between proving correctness, testing systems in the context of where they are used, and learning models from observable data. My talk will narrow in on language implementations, but also look at how formal correctness is helping us there. Unfortunately, provably-correct systems still elude us for many practical languages. Even worse, we are at a point where we rarely understand what’s going on in enough detail to improve performance or perhaps fix certain rare bugs.

If you like to attend, please register here.

In Theory, We Understand. In Practice, We Wish We Would

Here’s the abstract of my talk:

Our world runs on software, but we understand it less and less. In practice, the complexity of modern systems drains your phone’s battery faster, increases the cost of hosting applications, and consumes unnecessary resources, for instance, in AI systems. All because we do not truly understand our systems any longer. Still, at a basic level, we can fully understand how computers work, from transistors to processors, machine language, all the way up to high-level programming languages.

The convenience of contemporary programming languages is however bought with complexity. Over the last two decades, I admit, I added to that complexity. In the next two decades, I hope we can learn to build programming languages in ways that we can prove to be correct, enable us to generate their implementations automatically, and let systems select optimizations in a way that we can still understand the implications for software running on top of it.

You may now wonder where to go from here. And that’s a very good question. I have another month to figure that out, perhaps more… 😅

So, maybe see you in March?

Until then, suggestions, questions, and complaints, as usual on Mastodon, BlueSky, and Twitter.

Python, Is It Being Killed by Incremental Improvements?

Over the past years, two major players invested into the future of Python. Microsoft’s Faster CPython team has pushed ahead with impressive performance improvements for the CPython interpreter, which has gotten at least 2x faster since Python 3.9. They also have a baseline JIT compiler for CPython, too. At the same time, Meta is worked hard on making free-threaded Python a reality to bring classic shared-memory multithreading to Python, without being limited by the still standard Global Interpreter Lock, which prevents true parallelism.

Both projects deliver major improvements to Python, and the wider ecosystem. So, it’s all great, or is it?

In my talk talk on this topic at SPLASH, which is now online, I discussed some of the aspects the Python core developers and wider community seem to not regard with the same urgency as I would hope for. Concurrency makes me scared, and I strongly believe the Python ecosystem should be scared, too, or look forward to the 2030s being “Python’s Decade of Concurrency Bugs”.

In the talk, I started out reviewing some of the changes in observable language semantics between Python 3.9 and today and discuss their implications. I previously discussed the changes around the global interpreter lock in my post on the changing “guarantees”. In the talk, I also use the example from a real bug report, to illustrate the semantic changes:

request_id = self._next_id
self._next_id += 1

It looks simple, but reveals quite profound differences between Python versions.

Since I have some old ideas lying around, I also propose a way forward. In practice though, this isn’t a small well-defined engineering or research project. So, I hope I can inspire some of you to follow me down the rabbit hole of Python’s free-threaded future.

Incidentally, the latest release of TruffleRuby now uses many of the techniques that would be useful for Python. Benoit Daloze implemented them during his PhD and we originally published the ideas back in 2018.

Questions, pointers, and suggestions are always welcome, for instance, on Mastodon, BlueSky, or Twitter.

Screen grab of recording, showing title slide and myself at the podium.

Benchmarking Language Implementations: Am I doing it right? Get Early Feedback!

Modern CPUs, operating systems, and software in general do lots of smart and hard-to-track optimizations, leading to warmup behavior, cache effects, profile pollution and other unexpected interactions. For us engineers and scientists, whether in industry or academia, this unfortunately means that we may not fully understand the system on top of which we are trying to measure the performance impact of, for instance, an optimization, a new feature, a data structure, or even a bug fix.

Many of us even treat the hardware and software we run on top of as black boxes, relying on the scientific method to give us a good degree of confidence in the understanding of the performance results we are seeing.

Unfortunately, with the complexity of today’s systems, we can easily miss important confounding variables. Did we account, e.g., for CPU frequency scaling, garbage collection, JIT compilation, and network latency correctly? If not, this can lead us down the wrong, and possibly time-consuming path of implementing experiments that do not yield the results we are hoping for, or our experiments are too specific to allow us to draw general conclusions.

So, what’s the solution? What could a PhD student or industrial researcher do when planning for the next large project?

How about getting early feedback?

Get Early Feedback at a Language Implementation Workshop!

At the MoreVMs and VMIL workshop series, we introduced a new category of submissions last year: Experimental Setups.

We solicited extended abstracts that focus on the experiments themselves before an implementation is completed. This way, the experimental setup can receive feedback and guidance to improve the chances that the experiments lead to the desired outcomes. With early feedback, we can avoid common traps and pitfalls, share best practices, and deeper understanding of the systems we are using.

With the complexity of today’s systems, one person, or even one group, is not likely to think of all the issues that may be relevant. Instead of encountering these issues only in the review process after all experiments are done, we can share knowledge and ideas ahead of time, and hopefully improve the science!

So, if you think you may benefit from such feedback, please consider submitting an extended abstract describing your experimental goals and methodology. No results needed!

The next submission deadlines for the MoreVMs’26 workshop are:

  • December 17th, 2025
  • January 12th, 2026

For questions and suggestions, find me on Mastodon, BlueSky, or Twitter, or send me an email!

Older Posts

Subscribe via RSS