Efficient Deterministic Replay for Actors
Debugging concurrent systems is pretty hard, and we worked already for a while to make things a bit better. However, a big remaining problem is that bugs are not easily reproduced.
Actor systems allow for nice designs where all actor are loosely coupled and simply communicate via asynchronous messages. Unfortunately, this means there is a lot of non-determinism in the systems. Some issues may only show up for some rare actor schedules. This is a huge hurdle to understand what is going on, and to successfully debug such a race condition.
In our latest paper, Dominik showed that tracing of actor programs can be sufficiently optimized to enable a deterministic replay. We manage to use Truffle to reduce the overhead of tracing to ca. 10% (min. 0%, max. 20%) on microbenchmarks. On the more representative AcmeAir application, the latency of HTTP requests is only increased by 1% on average.
We think, these results show that recording of actor applications for deterministic replay can be practical. Of course, there is still more work necessary. For instance, for the time being, we need to run the application from the start to be able to replay it. So, stay tuned for future work, and attend Dominik’s presentation at ManLang’18.
Abstract
With the ubiquity of parallel commodity hardware, developers turn to high-level concurrency models such as the actor model to lower the complexity of concurrent software. However, debugging concurrent software is hard, especially for concurrency models with a limited set of supporting tools. Such tools often deal only with the underlying threads and locks, which obscures the view on e.g. actors and messages and thereby introduces additional complexity. To improve on this situation, we present a low-overhead record & replay approach for actor languages. It allows one to debug concurrency issues deterministically based on a previously recorded trace. Our evaluation shows that the average run-time overhead for tracing on benchmarks from the Savina suite is 10% (min. 0%, max. 20%). For Acme-Air, a modern web application, we see a maximum increase of 1% in latency for HTTP requests and about 1.4 MB/s of trace data. These results are a first step towards deterministic replay debugging of actor systems in production.
- Efficient and Deterministic Record & Replay for Actor Languages
D. Aumayr, S. Marr, C. Béra, E. Gonzalez Boix, H. Mössenböck; In Proceedings of the 15th International Conference on Managed Languages and Runtimes, ManLang'18, ACM, 2018. - Paper: PDF
- DOI: 10.1145/3237009.3237015
-
BibTex:
bibtex
@inproceedings{Aumayr:2018:RR, abstract = {With the ubiquity of parallel commodity hardware, developers turn to high-level concurrency models such as the actor model to lower the complexity of concurrent software. However, debugging concurrent software is hard, especially for concurrency models with a limited set of supporting tools. Such tools often deal only with the underlying threads and locks, which obscures the view on e.g. actors and messages and thereby introduces additional complexity. To improve on this situation, we present a low-overhead record & replay approach for actor languages. It allows one to debug concurrency issues deterministically based on a previously recorded trace. Our evaluation shows that the average run-time overhead for tracing on benchmarks from the Savina suite is 10% (min. 0%, max. 20%). For Acme-Air, a modern web application, we see a maximum increase of 1% in latency for HTTP requests and about 1.4 MB/s of trace data. These results are a first step towards deterministic replay debugging of actor systems in production.}, acceptancerate = {0.72}, author = {Aumayr, Dominik and Marr, Stefan and Béra, Clément and Gonzalez Boix, Elisa and Mössenböck, Hanspeter}, blog = {https://stefan-marr.de/2018/08/deterministic-replay-for-actors/}, booktitle = {Proceedings of the 15th International Conference on Managed Languages and Runtimes}, day = {12--13}, doi = {10.1145/3237009.3237015}, isbn = {978-1-4503-6424-9/18/09}, keywords = {Actors Concurrency Debugging Determinism MeMyPublication Replay SOMns Tracing Truffle}, month = sep, pdf = {https://stefan-marr.de/downloads/manlang18-aumayr-et-al-efficient-and-deterministic-record-and-replay-for-actor-languages.pdf}, publisher = {ACM}, series = {ManLang'18}, title = {{Efficient and Deterministic Record \& Replay for Actor Languages}}, year = {2018}, month_numeric = {9} }