Monday, February 13, 2012

Actor Creation Overhead

libcppa provides two actor implementations: a context switching and an event-based implementation.

The context-switching implementation is easier to use from a user's point of view. One has to write less code and receives can be nested. But there is a downside to this approach: each actor allocates its own stack. As an example for a current mainstream system: Mac OS X Lion defines the two constants SIGSTKSZ = 131072 and MINSIGSTKSZ = 32768 in its system headers. SIGSTKSZ is the recommended stack size in bytes and MINSIGSTKSZ is the minimum allowed stack size in bytes. Assuming a system with 500,000 actors, one would require a memory usage of at least 15 GB of RAM for stack space only. This would rise up to 61 with the recommended stack size instead in use. This clearly does not scale well for large systems. The event-based implementation uses fewer system resources, allowing developers to use hundreds of thousands of actors. Creating an event-based actor is cheap and lightweight but you have to provide a class-based implementation. Furthermore, you cannot use receive() since this would block the calling worker thread. However, the behavior-based approach is slightly different to use but fairly easy to understand and use (see the Dining Philosophers example).

The following benchmark measures the overhead of actor creation. It recursively creates 219 (524,288) actors, as the following pseudo code illustrates.
spreading_actor(Parent):
  receive:
    {spread, 0} =>
      Parent ! {result, 1}
    {spread, N} =>
      spawn(spreading_actor, self)) ! {spread, N-1}
      spawn(spreading_actor, self)) ! {spread, N-1}
      receive:
        {result, X1} =>
          receive:
            {result, X2} =>
              Parent ! {result, X1+X2}

main():
  spawn(spreading_actor, self)) ! {spread, 19}
  receive:
    {result, Y} =>
      assert(2^19 == Y)

This measurement tests how lightweight actor implementations are. We did not test the thread-mapped actor implementation of Scala, because the JVM cannot handle half a million threads. And neither could a native application.


It is not surprising that Erlang yields the best performance, as its virtual machine was build to efficiently handle actors. Furthermore, we can see the same increase in runtime caused by more hardware concurrency for the event-based libcppa implementation as in our previous benchmark. However, the context-switching (stacked) implementation clearly falls short in this scenario. Please note that this benchmark used the minimal stack size to be able to create half a million actors. Per default, libcppa uses the recommended stack size! Consider using event-based actors whenever possible, especially in systems consisting of a large amount of concurrently running actors.

The benchmarks ran on a virtual machine with Linux using 2 to 12 cores of the host system comprised of two hexa-core Intel® Xeon® processors with 2.27GHz. All values are the average of five runs.

The sources can be found on github (ActorCreation.scala, actor_creation.erl and actor_creation.cpp).

No comments:

Post a Comment