Friday, March 9, 2012

A Few Words About Theron

Theron is currently the only existing actor library for C++ besides libcppa. It implements event-based message processing without mailbox, using registered member functions as callbacks. I wanted to include Theron to my three benchmarks, but I had some trouble to get Theron running on Linux. I've compiled Theron in release mode, ran its PingPong benchmark ... and got a segfault. GDB pointed me to the DefaultAllocator implementation and I could run the benchmarks after replacing the memory management with plain malloc/free. Thus, the shown results might be some percentages better with a fixed memory management. However, I was unable to get results for the Actor Creation Overhead benchmark because Theron crashed for more than 210 actors.

Theron has two ways of sending a message to an actor, because it distincts between references and addresses. The push method can be used only if one has a reference to an actor. This is usually only the case for the creator of an actor. The send method uses addresses and is the general way to send messages.



The results for the Mixed Scenario benchmark are as follows.



Theron yields good results on two and four cores. However, the more concurrency we add, the more time Theron needs in the mixed scenario. I don't know about Theron's internals, but this is a common behavior of mutex-based software in many-core systems and cannot be explained by the "missing" memory management.

I used Theron in version 3.03. As always, the benchmarks ran on a virtual machine with Linux using 2 to 12 cores of the host system comprised of two hexa-core Intel® Xeon® processors with 2.27GHz. All values are the average of five runs.

The sources can be found on github in the benchmark folder (theron_mailbox_performance.cpp and theron_mixed_case.cpp).

Thursday, March 8, 2012

Mailbox Part 2

It's been a while since I've posted Mailbox Part 1 and I promised a benchmark for a 1:N communication scenario for up to 12 cores. So here it is. This benchmark compares libcppa to Erlang and Scala with its two standard library implementations and Akka.

The benchmark uses 20 threads sending 1,000,000 messages each, except for Erlang which does not have a threading library. In Erlang, I spawned 20 actors instead. The minimal runtime of this benchmark is the time the receiving actor needs to process 20,000,000 messages and the overhead of passing the messages to the mailbox. More hardware concurrency leads to higher synchronization between the sending threads, since the mailbox acts as a shared resource.



Both libcppa implementations show similar performance to Scala (receive) on two cores but have a faster increasing curve. The message passing implementation of Erlang does not scale well for this use case. The more concurrency we add, the more time the Erlang program needs, up to an average of 600 seconds on 12 cores. The results are clipped for visibility purposes in the graph. The increase in runtime for libcppa is similar to the increase seen in the Actor Creation Overhead benchmark and is caused by the scheduler (the cached stack algorithm scales very well and is not a limiting factor here). The overhead of stack allocation is negligible in this use case. Thus, the run time of both libcppa implementations is almost identical.

The benchmarks ran on a virtual machine with Linux using 2 to 12 cores of the host system comprised of two hexa-core Intel® Xeon® processors with 2.27GHz. All values are the average of five runs.

The sources can be found on github (MailboxPerformance.scala, mailbox_performance.erl and mailbox_performance.cpp).

Tuesday, March 6, 2012

RIP invoke_rules

The class invoke_rules was among the first classes of libcppa. In fact, it received a lot of refactoring even before the first commit on github. However, it's finally gone. If your code fails to compile with the current version, this is how to fix your code:
invoke_rules → partial_function
timed_invoke_rules → behavior
The class invoke_rules had too much changes in the past and its name isn't very well chosen. In fact, the implemented behavior of it already was identical to a partial function. But it had a brother called timed_invoke_rules that was a partial function with a timeout. That's pretty much the definition of an actor's behavior, isn't it? It's an old remains from the time I've implemented on() and after(). The new partial_function/behavior interface is straightforward and much more intuitive.