In short, I think, it is hard because on the one hand there is not a single concurrency
abstraction that fits all problems, and on the other hand the various different abstractions
are rarely designed to be used in combination with each other.
But let us start at the beginning. The terminology might get otherwise in the way.
For the purpose of this discussion, I distinguish concurrent programming and
Concurrent programming and its corresponding programming abstractions focus
on the correctness of a computation and the
consistency of state in the context of parallel or interleaved execution.
Parallel programming and its corresponding programming abstractions focus
on structuring an algorithm in a way that
parallel computational resources are used efficiently.
Thus, while both programming approaches are related and are often used in
combination, their goals and consequently their main abstractions are different.
I am not claiming that parallel programming is solved and widely understood,
but it is comparably easy to apply it to an isolated problem when performance is
an issue. The emphasize here is on isolated, because the integration into an
existing system is the hard part and can expose all kind of concurrency issues
for which concurrent programming techniques are required as a solution.
When do Concurrency Programming Abstractions Break Down?
The first questions might be why and when are we using concurrent programming?
The main reason is the desire to increase “performance”, either by increasing throughput,
by reducing latency, or improving interactivity by moving operations off the critical path.
Sometimes, a concurrent design also happens to map best on a problem
by aligning the program structure with the domain model for instance in terms of
tasks or processes and thus is chosen as solution.
A classic example calling for concurrent execution is user interfaces.
Independent of a particular solution, the overall goal is to move a computational or I/O task
out of the loop that processes user-generated events to maintain the interactivity of an application.
This offloading can either be done by using some form of asynchronous I/O and computation library.
To give but a few examples, C#’s
await is frequently used for this purpose,
as well as Java’s
ExecutorService, or Clojure’s
In another simple scenario, the application is already parallel and some form of
execution monitoring needs to be added. Either to stear optimizations or even for billing purposes.
Depending on the concrete scenario, various solution approaches are available. In a
non-performance-critical scenario, a simple atomically modified counter can be sufficient.
When performance matters, it might be more appropriate to gather the initial counts
local to a single thread, however, this requires later communication to build the
sum of all thread-local values, which might lead to consistency issue because it is harder
to get one globally consistent snapshot of all local counters. Depending on the requirements
all kind of different solutions in-between could be devised. If the count itself for instance
does not matter, a scalable non-zero indicator might suffice. Either way, the problem remains
the same. An existing parallel program needs to be changed, which potentially introduces concurrency
Keeping something like an independent counter consistent is however rather trivial compared to
making parallel or concurrent operations on a large and complex shared data structure yield
consistent and correct results. Imagine a tree or graph representation for a program as frequently
employed by IDEs. In such a scenario various subsystems might want to change or annotate the graph.
For instance to include inferred types, add test coverage, or history information, apply
refactorings, or simply account for the change done by the user in the editor. Often the relevant
subsystems work concurrently. One could of course the graph immutable and ‘updates’ produce
strictly new versions of it. However, for various reasons other choices might be made and then
the question arises how consistent updates are possible. Solutions could potentially include
locks or software transactional memory (STM).
When making the decision for how to manage the consistency for such a graph that is updated
concurrently, the rest of the system has unfortunately to be considered as well. Suddenly our
inconspicuous counter might need to take into account that the STM might retry transactions.
Similarly, the library for asynchronous tasks might suddenly need to retract a task from the
run queue when a transaction is retried. This is the point that makes concurrent programming
really hard. Such ‘design’ decision are not strictly local anymore and the question arises
not only for STM but for all concurrent programming abstractions: do they compose well?
Huge Number of Different Abstractions
The question of whether concurrent programming abstractions compose is not at all straightforward
to answer. As indicated in the discussion in the previous section, there are
many different possible requirements in any given situation, so that even the design of a simple
counter becomes a complex undertaking. Over the decades, the huge amount of tradeoffs resulted in
many different variations of few at least superficially related concepts. The tradeoffs are also
not only about performance for instance in terms of how much guarantees a framework may provide.
Often somewhat philosophical points come into the discussion. For instance, some people argue that
blocking operations preserve better the local sequential view on a system and therefore are
simpler to program, often however at the cost of potential deadlocks. On the other hand, a
completely asynchronous non-blocking design might be deadlock free, but depending on how the
language exposes it, one might end up in callback hell and code becomes hardly maintainable.
Yet another aspect might be whether to allow non-determinism or not. It can be easier to reason
about a strictly deterministic system. However, such a language or framework might restrict the
expressiveness so much that not all conceivable applications can be expressed in it.
To give a few examples, the futures of Clojure and Java are blocking, which always
introduces the risk for deadlocks when other blocking abstractions are used in conjunction.
The futures offered by AmbientTalk and E (called promises) are inherently non-blocking
to fit into the overall nature of these two languages as being non-blocking and deadlock free.
Consequently however, both types of futures are used differently and one might argue that one is
preferable over the other in certain situations.
Similar is the situation when it comes to the concrete implementation of communicating
sequential processes. Personally, I consider the strict isolation between processes, and
therefore the enforcement that any form of communication has to go via the explicit channels, as
a major property that can simplify reasoning about the concurrent semantics and for instance
makes sure that programs are free from low-level data races. However, Go for instance chose
to adopt the notion of communicating via channels but its goroutines are not isolated. JCSP,
a Java library goes the same way. The occam-pi language on the other hand chose to stick
with the notion of fully isolated processes. The same design discussion could be had for
implementations of the actor model. AmbientTalk and Erlang go with fully isolated processes,
while for instance Akka makes the pragmatic decision that it cannot guarantee isolation
because it is running on top of a JVM.
This discussion could go on for quite a while. Wikipedia lists currently more than 60
concurrent programming languages of which most will implement some specific variation of
a concept. In previous work, we identified roughly a hundred concepts that are related to
It can now of course be argued that a single language will not support all of them and thus
applications will perhaps only have to cope with a handful concurrent programming abstractions.
However, looking at large open source applications such as IDEs, it seems that the various
subsystems from time to time start to introduce their own abstractions. NetBeans for instance has
various representations of asynchronous task or future like abstractions and there are at
least two implementations of somewhat ‘transactional’ systems, one in the refactoring subsystem
and another one in the profiling library. They seem to implement something along the lines of STM
in different degrees of complexity. And this again raises the question how are these different
abstractions interacting with each other. A look at NetBeans bug track yields more than 4000 bugs
that contain the word “deadlock” and more than 500 bugs with the phrase
“race condition”. While most of these bugs are marked as closed, it is probably a good
indication that concurrent programming is hard and error prone.
Concurrent Programming Abstractions Not Designed for use in Combination
Usually concurrent programming is considered hard because low-level abstractions such as threads
and locks are used. While NetBeans uses these to a significant extent, it uses also considerably
more high-level concepts such as futures, asynchronous tasks, and STM. Now I would argue that it
is not necessarily the abstraction level but that various concurrent
programming abstractions are used together while they have not been designed for that purpose.
While each abstraction in isolation is well tailored for its purpose, and thus reduced the
accidental complexity, concurrency often does not remain confined to modules
or subsystem and thus the interaction between the abstractions causes significant accidental
As far as I am aware, the fewest languages have been designed from the ground up with concurrency
in mind, and even fewer languages are designed with the interaction of concurrent programming
abstractions in mind. While for instance Java was designed
with threads in mind and has the
synchronized keyword to facilitate thread-based programming,
its memory model and the
java.util.concurrent libraries were only added in Java 5.
Arguable, Java’s libraries are so low-level that languages such as Clojure and Scala
try to close the gap. Clojure was consequently designed from the start concurrent programming in
mind. It started out with
agents, and STM to satisfy the different use case for
concurrency. However, even so Clojure was design with them from the start, they do not interact
well. Atoms are considered low-level and do not regard transactions or agents at all. STM on the
other hand accounts for agents by deferring message sends until the end of a transaction to make
sure that a transaction can be retried safely. With only these three abstractions, Clojure
actually could be considered a fine example. However, these abstractions were apparently not
sufficient to cover all use cases equally well and
promises as well as CSP
in form of the
core.async library got added. Unfortunately, the abstractions were not
designed to integrate well with the existing ones. Instead, there were merely added
and interactions can cause for instance unexpected deadlocks or race conditions (for more details
see this paper).
In order to give a more academic example, which might not be governed by mostly pragmatic concerns,
Haskell might be a reasonable candidate. Unfortunately, even in Haskell the notion of adding
instead of integrating seems to be the prevalent one. I am not a Haskell expert, but the STM shows
the same symptoms Clojure has, however, in a slightly different way. The standard
Control.Concurrent package comes for instance with
Chan as abstractions for
mutable state and communication channels. But instead of integrating the STM with these, it
introduces its own variants
TChan. It might be performance reasons that led to
this situation. However, from the perspective of engineering large applications this can hardly be
ideal, because the question of whether these abstractions can be used without problems in the same
application remains unanswered.
I think that concurrent programming is hard because the abstractions we use today are
not prepared for the job. They are good for one specific task, but they are not easily used in
conjunction with each other. Instead, interactions can lead for instance to unexpected race
conditions or deadlocks. And just to support the claim that interaction is an issue, it is not
just NetBeans that uses are variety of concurrent programming concepts. Eclipse looks similar,
and so do MonoDevelop and SharpDevelop. A study in the Scala world suggests also that
application developer chose to combine the actor model with other abstractions for instance for
So, what’s the solution? I think, we need to design languages and libraries that properly
integrate a variety of concurrent programming abstractions. How that should look concretely,
I don’t know yet. The work of Joeri De Koster shows how solutions could look like for actor
languages, and together with Janwillem Swallens, we are extending this work to a wider set
of languages. Personally, I still belief that the ownership-based metaobject protocol is a
useful foundation to experiment with various different concurrent programming abstractions on top
of one language. But, we will see.
For comments, suggestions, ideas, or complains that I did not consider your language that already
solves to problem, please catch me on Twitter @smarr or send me a