Actors! And now?

An Implementer’s Perspective on High-level Concurrency Models, Debugging Tools, and the Future of Automatic Bug Mitigation

The actor model is a great tool for various use cases. Though, it’s not the only tool, and sometimes perhaps not even the best. Consequently, developers started mixing and matching high-level concurrency models based on the problem at hand, much like other programming abstractions. Though, this comes with various problems. For instance, we don’t usually have debugging tools that help us to make sense of the resulting system. If we even have a debugger, it may barely allow us to step through our programs instruction by instruction.

Let’s imagine a better world! One were we can follow asynchronous messages, jump to the next transaction commit, or break on the next fork/join task created. Though, race conditions remain notoriously difficult to reproduce. One solutions it to record our program’s execution, ideally capturing the bug. Then we can replay it as often as need to identify the cause of our bug.

The hard bit here is making record & replay practical. I will explain how our concurrency-model-agnostic approach allows us to record model interactions trivially for later replay, and how we minimized its run-time overhead. In the case of actor applications, we can even make the snapshotting fast to be able to limit trace sizes.

Having better debugging capabilities is a real productivity boost. Though, some bugs will always slip through the cracks. So, what if we could prevent those bugs from causing issues? Other researchers have shown how to do it, and I’ll conclude this talk with some ideas on how we can utilize the knowledge we have in our language implementations to make such mitigation approaches fast.

Acknowledgments

The talk is based on work done in collaboration with Dominik Aumayr, Carmen Torres Lopez, Elisa Gonzalez Boix, and Hanspeter Mössenböck. I’d also like to thank the AGERE!’21 organizers to invited me. I enjoyed preparing the talk in some sense as a retrospective of the work we did in the MetaConc Project. For a more complete list of things and papers published in the wider context of the project, please head over to its website.

If you have questions, please feel free to reach out via Twitter.

Slides

How do we do Benchmarking?

Impressions from Conversations with the Community

This summer, I talked to a number of groups from the community on how they do benchmarking for instance as part of their day-to-day engineering and for the evaluation of ideas for research papers. It talked with them about their general approach and the tools they use.

This showed a wide variety of approaches, opinions, preferences, and tools. Though, it also showed me the there remains a lot of work to be done to get best practices adopted, and perhaps even build better tools, research better ways of benchmarking, and adopt reliable means for data processing and data analysis.

At the Virtual Machine Meetup (VMM’21), I report a bit on my general impressions, recounted the issues people talked about, their solutions or desire for solutions, the good practices observed, as well as ideas for improvements. There is a lot of diversity in the approaches used in the community. Some of the diversity comes from the wide range of research questions people are interested in, but a significant amount seems to be caused by the effort and expertise required for benchmarking, which is arguably very high.

Methodology

Before looking at the slides, please note that what we have here is based on on a small set of interviews, using a semi-structured approach. This means, the discussions were open-ended, and I did not ask exactly the same questions. Furthermore, much of the data is based on inference from these discussions.

Best Practices

This being said, there are a couple of points that seemed to be good practices perhaps worth advocating for.

  1. Use Automated Testing/Continuous Integration

    Correctness comes before performance. While not every project may justify a huge investment into testing infrastructure, I know that everything that I do not test is broken. Thus, at the very least, we need to make sure that our benchmarks compute the expected results.

  2. Use Same Setup for Day-to-Day Engineering as for Benchmarks used in Papers

    To often benchmarks are the last thing in the process, and the week before the deadline there’s a sense of panic. Benchmarking is not impossibly hard, but also far from trivial. A good and reliable setup takes time, start from day 1, use it while building your system/experiment, and then have it ready and tested when you want results for the paper.

  3. Most Continuous Integration Systems will Manage Artifacts

    Keeping track of benchmark results is hard. If you’re using CI (see point 1), you could probably use it to store benchmark results as well. This means, you automatically keep track of at least a part of the relevant bits of information needed to figure out what the data means when trying to analyse it.

  4. Automate Data Handling

    Copying data around manually makes it too easy to make mistakes. For instance when using spreadsheets, don’t copy data around manually. Instead, try to use the spreadsheet’s system for data import, eliminating one source of easy mistakes.

    Of course, having things automated, likely means that rerunning and analyzing results after a bug fix or last minute change becomes much easier.

  5. Define Workflow that Works for Your Group

    Too often the knowledge of how to do benchmarking and performance evaluation can be in the head of a PhD student, who may leave after finishing. Instead of letting the knowledge leave with them, it’s worthwhile to actively start teaching how to run good benchmarks. It also makes the life of new team members much easier…

These are just some quick thoughts after giving my talk. If you have questions, feel free to reach out via Twitter.

Slides

Interpreters, Compilation, and Concurrency Tooling in PLAS at Kent

Here at Kent, we have a large group of researchers working on Programming Languages and Systems (PLAS), and within this group, we have a small team focusing on research on interpreters, compilation, and tooling to make programming easier.

It’s summer 2021, and I felt it’s time for a small inventory of the things we are up to. At this very moment, the team consists of Sophie, Octave, and myself. I’ll include Dominik and Carmen, as well, for bragging rights. Though, they are either finishing the PhD or just recently defended it.

If you find this work interesting, perhaps you’d like to join us for a postdoc?

Utilizing Program Phases for Better Performance

Sophie recently presented her early work at the ICOOOLPS workshop. Modern language virtual machines do a lot of work behind the scenes to understand what our programs are doing.

However, techniques such as inlining and lookup caches have limitations. While one could see inlining as a way to give extra top-down context to a compilation, it’s inherently limited because of the overhead of run-time compilation and excessive code generation.

To bring top-down context more explicitly into these systems, Sophie explores the notion of execution phases. Programs often do different things one after another, perhaps first loading data, then processing it, and finally generating output. Our goal is to utilize these phases, to help compilers produce better code for each of them. To give just one example, here a bit from one of Sophie’s ICOOOLPS slides:

Screenshot of RStudio

The green line there is the microbenchmark with phase-based splitting enabled, giving a nice speedup for the second and fourth phase, benefitting from monomorphization and only compiling whats important for the phase, and not for the whole program.

Sophie has already shown that there is potential, and the discussions at ICOOOLPS lead to a number of new ideas for experiments, but it’s still early days.

Generating Benchmarks to Avoid the Tiny-Benchmark Trap

In some earlier blog posts and a talk at MoreVMs’21 (recording), I argued that we need better benchmarks for our language implementations. It’s a well known issue that the benchmarks that academia uses for research are rarely a good representation of what real systems do. Often, simply because they are tiny. Thus, I want to be able to monitor a system in production and then generate benchmarks that can be freely shared with other researchers from the behavior that we saw. Octave is currently working on such a system to generate benchmarks from abstract structural and behavioral data about an application.

It’s a long way, and Octave currently instruments Java applications, record what they do at run time, and generate benchmark with similar behavior from that. Given that Java isn’t the smallest language, there’s a lot to be done, but I hope we’ll have a first idea of whether this could work by the end of the summer 🤞🏻

Reproducing Nondeterminism in Multiparadigm Concurrent Programs

Dominik is currently writing up his dissertation on reproducing nondeterminism. His work is essentially all around tracing and record & replay of concurrent systems. We wrote a number of papers [1, 2, 3] which developed efficient ways of doing this first just for actor programs, and then for various other high-level concurrency models. The end result allows us to record & replay programs that combine various concurrency models, with very low overhead. This is the kind of technology that is needed to reliably debug concurrency issues, and perhaps in the future even allow for automatic mitigation!

Advanced Debugging Techniques to Handle Concurrency Bugs in Actor-based Applications

Last month, Carmen successfully defended her PhD on debugging techniques for actor-based applications. This work focused on the user-side of debugging. First, we built a debugger for all kind of concurrency models, looked at what kind of bugs we should worry about, and finally she did a user study on a new debugger she built to address these issues. We also worked on enabling the exploration of all possible executions and concurrency bugs of a program. The Voyager Multiverse Debugger has a nice demo created by our collaborators, which shows that we can navigate all possible executions paths of non-deterministic programs:

Preventing Concurrency Issues, Automatically, At Run Time

While all these projects are very dear to my heart, there’s one, I’d really love to make more progress on as well: automatically preventing concurrency issues from causing harm.

We are looking for someone to join our team!

If you are interested in programming language implementation and concurrency, please reach out! We have a two-year postdoc position here at Kent in the PLAS group, and you would join Sophie, Octave, and me to work on interesting research. In the project, we’ll continue to collaborate also with Prof. Gonzalez Boix and her DisCo research group in Brussels (Belgium), Prof. Mössenböck in Linz (Austria), and the GraalVM team of Oracle Labs, which includes the opportunity for research visits.

Our team is well connect for instance also with Shopify, which supports a project on improving warmup and interpreter performance of GraalVM languages.

Again, please reach out via email or perhaps via Twitter.

Older Posts

Subscribe via RSS