Here at Kent, we have a large group of researchers working on Programming Languages and Systems (PLAS), and within this group, we have a small team focusing on research on interpreters, compilation, and tooling to make programming easier.
Motivated by Tiger, a tool for generating interpreters, being mentioned on Twitter, I had a brief look at vmgen, Tiger, eJSTK, Truffle DSL, and DynSem. What follows are my rather rough notes and pointers. So, this is by no means a careful literature study, and I welcome further pointers.
One of the hard problems in language implementation research is benchmarking. Some people argue, we should benchmark only applications that actually matter to people. Though, this has various issues. Often, such applications are embedded in larger systems, and it’s hard to isolate the relevant parts. In many cases, these applications can also not be made available to other researchers. And, of course, things change over time, which means maintaining projects like DaCapo, Renaissance, or Jet Stream is a huge effort.
Following up on my last blog post, I am going to look at how Ruby is used to get a bit of an impression of whether there are major differences between Ruby and Smalltalk in their usage.
Recently, I was wondering how large code bases look like when it comes to the basic properties compiler might care about. And here I am not thinking about dynamic properties, but simply static properties such as length of methods, number of methods per class, number of fields, and so on.
I started writing this post when being very very annoyed by this blog post on the SIGPLAN blog. I could not understand how “THE SIGPLAN” blog could simply write 30 years of programming language research out of existence, only barely acknowledging Self and JavaScript. It felt like duty called…
Last year, I was asked to give a talk for the Meta’19 workshop. It’s a workshop on metaprogramming and reflection. The submission deadline for this year’s edition, is less than a month away: Check it out!
Last September, I had a lot of fun putting together a lecture on language implementation techniques. It is something I wanted to do for a while, but I had not had a good excuse before to actually do it.
It has been a while since we put together a release for SOMns. And it has been even longer, since I last wrote about it on this blog.
Disclaimer: The artifact, for which I put this automation together, was rejected. I take this as a reminder that the technical bits still require good documentation to be useful.
SOM, the Simple Object Machine, is a little dynamic language designed for teaching object-oriented virtual machine design. It originates in Aarhus, Denmark, and according to Lars Bak, it was implemented in the course of two days by Kasper Lund. They used it back in 2001 for a course at the University of Aarhus.
Debugging concurrent systems is pretty hard, and we worked already for a while to make things a bit better. However, a big remaining problem is that bugs are not easily reproduced.
Programming languages naturally come with a library of containers or collection types. They allow us to easily work with arbitrary number of elements, which is something all major languages care about. Unfortunately, it seems like there is not much writing on how to design such libraries. Even asking a few people that worked for a long time on collection libraries did not yield much of a structured approach to such a central element for our languages. The one major piece of writing we found is the Scala people describing their experience with bit rot and how they redesigned their collection implementation to avoid it.
When we have to debug applications that use concurrency, perhaps written in Java, all we get from the debugger is a list of threads, perhaps some information about held locks, and the ability to step through each thread separately.
Note: This post is meant for people familiar with Truffle. For introductory material, please see for instance this list.
This post, the fourth in the series, is about my current work on concurrency and tooling. As mentioned before, I believe that there is not a single concurrency model that is suitable for all problems we might want to solve. Actually, I think, this can be stated even stronger: Not a single concurrency model is appropriate for a majority of the problems we want to solve.
The third post of this series is about how I started using Truffle and Graal, pretty much 4 years ago. It might be in parts ranty, but I started using it when it was in a very early stage. So, things are a lot better today.
Last week, I started a series of posts to go over some of the projects I was involved in during my first 10 years working on language implementations. Today’s post focuses on my time as PhD student.
Since SOMns is a pure research project, we aren’t usually doing releases for SOMns yet. However, we added many different concurrency abstractions since December and have plans for bigger changes. So, it seems like a good time to wrap up another step, and get it into a somewhat stable shape.
One possible way for modeling concurrent systems is Tony Hoare’s classic approach of having isolated processes communicate via channels, which is called Communicating Sequential Processes (CSP). Today, we see the approach used for instance in Go and Clojure.
Research on programming languages is often more fun when we can use our own languages. However, for research on performance optimizations that can be a trap. In the end, we need to argue that what we did is comparable to state-of-the-art language implementations. Ideally, we are able to show that our own little language is not just a research toy, but that it is, at least performance-wise, competitive with for instance Java or JavaScript VMs.
Next weekend starts one of the major conferences of the programming languages research community. The conference hosts many events including our Meta’16 workshop on Metaprogramming, SPLASH-I with research and industry talks, the Dynamic Languages Symposium, and the OOPSLA research track.
With the Truffle language implementation framework, we got a powerful foundation for implementing languages as simple interpreters. In combination with the Graal compiler, Truffle interpreters execute their programs as very efficient native code.
Now that we got just-in-time compilation essentially “for free”, can we get IDE integration for our Truffle languages as well?
One of the first things that I found problematic about paper writing was the manual processing and updating of numbers based on experiments. Ever since my master thesis, this felt like a unnecessary and error prone step.
Beside the great performance after just-in-time compilation, the Truffle Language implementation framework provides a few other highly interesting features to language implementers. One of them is the instrumentation framework, which includes a REPL, profiler, and debugger.
It has been a while since we started working on how to extended the Actor model with mechanisms to safely share state. Our workshop paper on Tanks was published in 2013. And now finally, an extended version of this work was accepted for publication. Below you can find the abstract with a few more details on the paper, and of course a preprint of the paper itself.
We, or more specifically our colleagues from the Software Languages Lab in Brussels are looking for a post-doctoral researcher to work on a collaborative research project with us.
Last December, we got a research project proposal accepted for a collaboration between the Software Languages Lab in Brussels and the Institute for System Software here in Linz. Together, we will be working on tooling for complex concurrent systems. And with that I mean systems that use multiple concurrency models in combination to solve different problems, each with the appropriate abstraction. I have been working on these issues already for a while. Some pointers are available here in an earlier post: Why Is Concurrent Programming Hard? And What Can We Do about It?
Continuing a little bit with writing notes on Truffle and Graal, this one is based on my observations in SOMns and changes to its message dispatch mechanism. Specifically, I refactored the main message dispatch chain in SOMns. As in Self and Newspeak, all interactions with objects are message sends. Thus, field access and method invocation is essentially the same. This means that message sending is a key to good performance.
Over the course of the next four weeks, I plan to publish a new post every Tuesday to give a detailed introduction on how to use the Graal compiler and the Truffle framework to build fast languages. And this is the very first post to setup this series. The next posts are going to provide a bit of background on Golo, the language we are experimenting with, then build up the basic interpreter for executing a simple Fibonacci and later a Mandelbrot computation. To round off the series, we will also discuss how to use one of the tools that come with Graal to optimize the performance of an interpreter. But for today, let’s start with the basics.
The year leading up to SPLASH has been pretty busy. Beside my own talks on Tracing vs. Partial Evaluation and Optimizing Communicating Event-Loop Languages with Truffle, there are going to be three other presentations on work I was involved in.
The past few month, I have been busy implementing a fast actor language for the JVM. The language is essentially Newspeak with a smaller class library and without proving access to the underlying platform, which can lead to violations of the language’s guarantees.
Back in 2013 when looking for a way to show that my ideas on how to support concurrency in VMs are practical, I started to look into meta-compilation techniques. Truffle and RPython are the two most promising systems to build fast language implementations without having to implement a compiler on my own. While these two approaches have many similarities, from a conceptual perspective, they take two different approaches that can be seen as the opposite ends of a spectrum. So, I thought, it might be worthwhile to investigate them a little closer.
Yesterday at the Virtual Machine Meetup, I was giving a talk about why I think concurrent programming is hard, and what we can do about it.
Runtime metaprogramming and reflection are slow. That’s a common wisdom. Unfortunately. Using refection for instance with Java’s reflection API, its dynamic proxies, Ruby’s #send
or #method_missing
, PHP’s magic methods such as __call
, Python’s __getattr__
, C#’s DynamicObject
s, or really any metaprogramming abstraction in modern languages unfortunately comes at a price. The fewest language implementations optimize these operations. For instance, on Java’s HotSpot VM, reflective method invocation and dynamic proxies have an overhead of 6-7x compared to direct operations.
Today, I gave a talk on implementing languages based on the ideas behind RPython and Truffle at FOSDEM on the main track. Please find abstract and slides below.
More than a decade ago, programmer productivity was identified as one of the main hurdles for future parallel systems. The so-called Partitioned Global Address Space (PGAS) languages try to improve productivity and explore a range of language design ideas. These PGAS languages are designed for large-scale high-performance parallel programming and provide the notion of a globally shared address space, while exposing the notion of explicit locality on the language level. Even so the main focus is high-performance computing, the language ideas are also relevant for the parallel and concurrent programming world in general.
Today, I got a few more benchmarks running to get a better idea of where RTruffleSOM and TruffleSOM stand in terms of their absolute performance.
The first results of my experiments with self-optimizing interpreters was finally published in IEEE Software. It is a brief and very high-level comparison of the Truffle approach with a classic bytecode-based interpreter on top of RPython. If you aren’t familiar with either of these approaches, the article is hopefully a good starting point. The experiments described in it use SOM, a simple Smalltalk.
Parallel programming is frequently claimed to be hard and all kind of approaches have been proposed to solve the complexity issues. The Fork/Join programming style introduced with Cilk enables the parallel decomposition of problems in a recursive divide-and-conquer style, and on the surface looks very simple with its minimalistic approach of having a fork
and a join
language construct. But is it actually simple to use? To find out, Mattias started to dig through the Java open source projects on GitHub and tried to identify common patterns. Next week, he will present our findings at PPPJ’14.
The actor model is a pretty nice abstraction to reason about completely independent entities that interact purely by exchanging messages. However, for software development, some see the pure actor model as too fine-grained and too restrictive exposing many of the low-level issues such as data races again on a higher level again, and thereby forgoing some of its conceptual benefits.
One of the big questions that came up during my PhD was: ok, now you got your fancy ownership-based metaobject protocol, and you can implement actors, agents, communicating sequential processes, software transactional memory, and many others, but now what? How are you going to use all of these in concert in one application? Finding a satisfying answer is unfortunately far from trivial.
More than three years ago, Lode and I started thinking about parallel event processing for realtime systems. The main use case back then was gesture and motion detection based on cameras such as the Kinect. Thierry created the first fully functional prototype called PARTE, and in addition to his master thesis, we wrote a workshop paper about it. Now, we finally got also the revised and extended version of this paper accepted.
To prepare some experiments with Pharo’s new compiler infrastructure and a simple AST interpreter, I ported my implementation of the Ownership-based Metaobject Protocol (OMOP) to the Pharo 3. Loading the OMOP into an image will give you an STM implementation, a basic actor system, communicating sequential processes, Clojure-like agents, and active objects. Eventually, the goal is to provide a more extensive set of such concurrent programming mechanisms on top of the OMOP, but for now these five should already give an impression of how the OMOP itself works.
Today at FOSDEM, I gave a brief talk on implementing SOM, a little Smalltalk, with RPython and Truffle. RPython, probably best known for the PyPy implementation, uses meta-tracing JIT compilation to make simple interpreters fast. Truffle, a research project of Oracle Lab, is an approach for building self-optimizing interpreters and in combination with Graal, it gives a JIT compiler for AST-like interpreters. In the talk, I briefly sketch both of them, without going into many details.
And another paper that’s going to be presented by Joeri is our work on Tanks, a variation of communicating event loops (à la E or AmbientTalk). Tanks add synchronous and consistent read access to the event loop model.
On Sunday, I am going to present work on a distributed Rete engine I have been involved in over the last year. The presentation will be at the AGERE workshop co-located with SPLASH. Note that most of the work has been done by Janwillem and Thierry over the last two years. They did a great job in first implementing and parallelizing our Rete engine and now distributing it to scale up for “big data” scenarios.
Since quite a while, I am using R as a scripting language to generate graphs to the benchmarking results of various experiments. Since we have a new 64 core machine at the lab, I happen to run into performance issues with scripts that process the benchmark results, because the number of measurements increased significantly and the plyr library I use was designed with ease of use in mind, instead of performance. Concretely, I was experiencing script runtimes of up to 25min for a data set with roughly 50.000 measurements.
Last Friday, I defended my PhD dissertation. Finally, after 4 years and a bit, I am done. Finally. I am very grateful to all the people supporting me along the way and of course to my colleagues for their help.
It has been a while since SPLASH’12, but I got finally around to put up a copy of our paper at the AGERE’12 workshop. It is based on Thierry’s master thesis and presents his work on parallelizing a Rete engine for gesture recognition. Lode and I were his advisors and are happily working with him on what we promised in the future work section.
My second talk at Smalltalks 2012 was most likely the reason why the organizers invited me in the first place. It was a slightly extended version of the Sly and RoarVM talk for the FOSDEM Smalltalk Dev Room from the beginning of the year, reporting on the Renaissance Project.
Yesterday was the first day of Smalltalks 2012 in Puerto Madryn. The organizers invited my to give a keynote on a topic of my choice, which I gladly did. Having just handed in my thesis draft, I chose to put my research into the context of Smalltalk and try to relate it to one of the main open questions: How do we actually want to program multicore systems.
Yesterday, I gave a brief lecture at the University of Quilmes half an hour outside of Buenos Aires. Since the students were from all levels of the bachelor program, I tried to give them a little impression of why we have multicore processors in the first place, and how fork/join with work-stealing as well as event-loop actors could be used to program these systems.
[Disclaimer: I am one of the assistants supporting the RACES’12 organizers and a PC member.]
Writing a PhD dissertation, a master thesis, a bachelor thesis, or any other kind of lengthy document can be a lot less rewarding than hacking on something that gives real feedback once it starts working…
You got a big multicore, or manycore machine, but do not have a clue of how to actually use it, because your application doesn’t seem to scale naturally? Well, that seems to be a problem many people are facing in our new manycore age. One possible solution might be to accept less precise answers by relaxing synchronization constraints. That could allow us to circumvent Amdahl’s law when Gustafson is out of reach.
My paper, on how to support various concurrency models, with an ownership-based meta-object protocol (MOP) was accepted at TOOLS’12. Below, you will find a preprint version of the paper. A later post will provide details on how to use it and how to experiment with the MOP in Pharo 1.3.
Modularity: AOSD’12 will be in Potsdam at the end of March, and I am looking forward especially to the MISS’12 workshop.
Welcome to Academia. That is how I take this one…
With Joeri we have been working already for a while on a paper to extend the standard actor model with more parallelism. This work is not completed yet, and there are still some theoretical issues with the approach he designed. But we are working on it!
The third day started with Brendan Eich’s keynote on JavaScript’s world domination plan. It was a very technical keynote, not very typical I suppose. And he was rushing through his slides with an enormous speed. Good that I have some JavaScript background. Aside all the small things he mentioned, interesting for me is that he seemed to be very interested to get Intel’s RiverTrail approach to data-parallelism into ECMAScript in one or another form. That is kind of contradicting the position I heard so far, that ECMAScript would be done with having WebWorkers as a model for concurrency and parallel programming.
The second day of the technical tracks started with a keynote by Markus Püschel. He is not the typical programming language researcher you meet at OOPSLA, but he does research in automatic optimization of programs. In his keynote, he showed a number of examples how to get the best performance for a given algorithm out of a particular processor architecture. Today’s compilers are still not up to the task, and will probably never be up to it. Given a naïve implementation, hand-optimized C code can have 10x speedup when dependencies are made explicit, and the compiler knows that no aliasing can happen. He was then discussing how that can be approached in an automated way, and was also thinking about what programming languages could do.
The first day of the technical tracks including OOPSLA started with a keynote by Ivan Sutherland titled The Sequential Prison. His main point was that the way we think and the way we build machines and software is based on sequential concepts. The words we use to communicate and express ourselves are often of a very sequential nature. His examples included: call, do, repeat, program, and instruction. Other examples that shape and restrict our way of thinking are for instance basic data structures and concepts like strings (character sequences). However, we also use words that enable thinking about concurrency and parallelism much better. His examples for these included: configure, pipeline, connect, channel, network, and path.
The second conference day was unfortunately full of “conflicts of interest”… It was pretty hard to choose between all the talks on the schedule.
And here we go again: SPLASH 2011 has started with its first day of workshops.
As preparation for SPLASH’11, here my paper for the VMIL workshop. It is a position paper discussing in which direction virtual machines should evolve in the future with regard to the challenges manycore architectures and concurrent programming bring.
Today, I gave a talk at the ExaScience Lab, Intel Labs Europe in Leuven at IMEC. I talked mainly about the idea of nondeterministic programming, the Sly programming language and some details on our Smalltalk manycore virtual machine that enables those experiments. Thus, tried to spread the word about our Renaissance project at bit further.
The following introduction and analysis of the Sly3 programming language was written by Pablo Inostroza Valdera as part of his course work for the Multicore Programming course of Tom Van Cutsem. The assignment was to write a blog post about a topic of their own choice, and I repost Pablo’s work here with his permission to spread the word about Sly a bit wider. We made his article also available as part of his work for the Renaissance project itself.
Today, I gave a presentation on Traits for PHP in Paris. It was a quite interesting audience, ca. 35 professional PHP developers and all interested in Traits.
Last Friday was the annual Lab event of our Software Languages Lab. Like last year, many people related to the lab in one or the other way came to get an overview of what the current topics of our research are.
We are happy to announce, now officially, RoarVM: the first single-image manycore virtual machine for Smalltalk.
Just a brief heads up before the actual announcement of RoarVM.
As usual I will write about a few of my personal highlights of SPLASH and the co-located workshops. That is mostly from my spotty notes, and from memory, so I don’t guarantee 100% accuracy, especially with respect to what other people might have said.
The 12th IEEE Internal Conference on High Performance Computing and Communications was not the first conference I attended. However, it was the first one where I actually presented a paper in the main research track.
On a related note to the last post, my poster got also accepted for SPLASH’10.
In October, I will give a brief presentation on the state of affairs with my PhD research at the SPLASH 2010 Doctoral Symposium. The basic idea has not changed since my last presentation at the TiC’10 summer school. I haven’t been able to do a lot of real work for it, but the ideas are a bit clearer now. The following two-page proposal will be published as part of the conference proceedings.
The last half year was an interesting departure from my actual PhD research. First, I though the idea of barriers and phasers might be interesting to incorporate into a virtual machine as part of my thesis, but as it turned out, they are much to high-level and are better off implemented in a library. The gain for direct support in a VM is just not proportional to the effort and restrictions which come with that step.
This is the last post about the TiC’10 summer school and covers the two remaining lectures. Mark Moir talked about concurrent data structures and transactional memory, and last but not least, Madan Musuvathi presented the work they did at Microsoft Research to improve the testing of concurrent applications.
This post is a follow up on my first report on the TiC’10 summer school. It covers mainly the talks about X10 and formal aspects of concurrency.
I already posted the presentation I gave at the summer school earlier. In the following posts, I will report a bit about the lectures of the summer school, similar to my posts about the TPLI summer school of last year.
On the first day here at the summer school, the organizers gave us the opportunity to present our research ideas to the lecturers and other participants.
Already quite a while ago, I was involved in writing a workshop paper about an actor model for virtual machines. Actually, the main idea was to find a concurrency model for a VM which supports multi-dimensional separation of concerns. However, AOP is not that interesting for me at the moment, so I am focussing on the concurrency, especially the actor-based VM model.
My second workshop paper got published at the ACM Digital Library. This is actually only an abstract, but nonetheless, it might be interesting for people looking into the design of virtual machines and especially bytecodes/intermediate languages.
Finally, my first workshop paper got published, which was a little odyssey with some misunderstandings, but anyway, now it is out. It is just a position paper, thus, do not expect to many insights. However, what it describes is my big plan, and hopefully the story of my PhD. Am working on it…
Spending the time on coming up with a nice poster eventually paid off.
And here is the winner of this year’s MoVES poster award:
The third and last part of the summer school started with Ondrej Lhotak talking about pointer analysis. David Bacon presented a basic introduction to garbage collection and detailed description of the real-time Metronome GC. The last lecturer for this summer school was Patrick Cousot, who gave a very basic and thus understandable introduction to abstract interpretation.
Along the way to measure the performance of a Smalltalk implementation for commodity multi-core systems, I tried to use Pharo as a more convenient development platform. Well, and I failed in the first attempt…
PHP 5.3 is already released for a while and starts to settle in.
The second part of the summer school was a bit more applied and more in the direction of my own interests. Chandra Krintz talked about managed runtime environments. Yannis Smaragdakis introduced multi-threaded programming and transactional memory. Sumit Gulwani as the third lecturer taught us symbolic bounds computation.
Thanks to the financial support of FWO and a grant by the summer school itself, I was able to go to Eugene, Oregon and learn about theory and practice of language implementation.
Finally, I managed to complete the changes for the traits patch. Since, PHP 5.3 already entered the release cycle and it is not sure whether there will be a PHP 5.4 with new language features, the new patch is written for the PHP6 codebase.
Yes! Finally! I am done with my Master’s Thesis.
One important part in research is the discussion with other people about your research topic, and well, the SVPP’08 was my first real research event and I really enjoyed it 🙂
Heute, am zweiten Tag der Unconference hier in Hamburg, hab ich endlich ein wenig Zeit mir auch ein paar Vorträge an zu hören. Gestern musste ich unerwarteter Weise gleich zwei mal ran.
During the last six months I’ve studied a language construct called Traits. It is a construct enabling fine-grained code reuse beside single inheritance in classical object oriented languages. A first implementation of this concept has been around for Squeak. Since I’ve used PHP already for a quite long time now, I decided to dig into its internals and implement Traits for PHP, too.