Wednesday, December 7, 2011

Documentation

Finally! The section about message handling is done and the doxygen documentation is now online at github pages: http://neverlord.github.com/libcppa/.

Send me a mail or leave a comment if you miss something important.
Have fun!

Friday, November 11, 2011

Pattern Matching Changes

The new pattern matching implementation is now merged back into master. There are a lot changes under the hood to significantly speedup pattern matching. However, there is one change to the user interface as well: any_type no longer exists. There is a replacement though, but it's usage differs a little bit: anything.
This is a snipped using the old any_type syntax:
on<int, any_type>() >> [](int v1) { }, // 1
on<int, any_type*>() >> [](int v1) { } // 2
The semantic of the first line - matching exactly one element of any type - is no longer supported. The second line just needs to replace any_type* with anything to work:
// equal to 2
on<int, anything>() >> [](int v1) { }
This will match any message with an integer as first element.

Wednesday, September 21, 2011

delayed_send

delayed_send and its brother delayed_reply are great if you need to poll a resource every x (milli)seconds while staying responsive to other messages or if you want to implement a "timed loop". Both accept std::chrono durations in seconds, milliseconds, microseconds and minutes. However, the accuracy depends on your scheduler, operating system, workload, etc., so you shouldn't rely on a microseconds-timing.

A good example for such a loop is an animation. In message oriented programming (and this is was libcppa is about), the most idiomatic way to implement such a loop is to send a message to yourself that is delayed by a predefined amount of time. If you receive that message, you do the next animation step and send another delayed message to yourself, and so on, until the animation is done.

Today's example is a terminal-animation with a dancing Kirby. Have fun!

#include <chrono>
#include <iostream>
#include <algorithm>
#include "cppa/cppa.hpp"

using std::cout;
using std::endl;
using namespace cppa;

// ASCII art figures
constexpr const char* figures[] = {
  "<(^.^<)",
  "<(^.^)>",
  "(>^.^)>"
};

// array of {figure, offset} pairs
constexpr size_t animation_steps[][2] = {
  {1,  7}, {0,  7}, {0,  6}, {0,  5}, {1,  5}, {2,  5}, {2,  6},
  {2,  7}, {2,  8}, {2,  9}, {2, 10}, {1, 10}, {0, 10}, {0,  9},
  {1,  9}, {2, 10}, {2, 11}, {2, 12}, {2, 13}, {1, 13}, {0, 13},
  {0, 12}, {0, 11}, {0, 10}, {0,  9}, {0,  8}, {0,  7}, {1,  7}
};

constexpr size_t animation_width = 20;

// "draws" an animation step: {offset_whitespaces}{figure}{padding}
void draw_kirby(size_t const (&animation)[2]) {
  cout.width(animation_width);
  cout << '\r';
  std::fill_n(std::ostream_iterator<char>{cout}, animation[1], ' ');
  cout << figures[animation[0]];
  cout.fill(' ');
  cout.flush();
}

void dancing_kirby() {
  // let's get it started
  send(self, atom("Step"));
  // iterate over animation_steps
  auto i = std::begin(animation_steps);
  receive_for(i, std::end(animation_steps)) (
    on<atom("Step")>() >> [&]() {
      draw_kirby(*i);
      // animate next step in 150ms
      delayed_send(self, std::chrono::milliseconds(150), atom("Step"));
    }
  );
}

int main() {
  cout << endl;
  dancing_kirby();
  cout << endl;
}

Tuesday, September 6, 2011

Timed Receive

It's not unusual to send a message and then wait for response. But if your communication partner doesn't reply, you'll wait forever. That's why we need an option to specify timeouts.

Here's a short example actor that does nothing but sends you back your own messages and exits if he idles for 10sec.

#include <chrono>
#include "cppa/cppa.hpp"

using namespace cppa;

void echo_server()
{
  receive_loop (
    others() >> []() {
      self->last_sender() << self->last_dequeued();
    },
    after(std::chrono::seconds(10)) >> []() {
      quit(exit_reason::user_defined);
    }
  );
}


The after()-statement has to be the last one and you can't specify more than one. An empty receive with nothing but a timeout should be used to sleep for a certain amount of time, e.g.:
receive(after(std::chrono::milliseconds(5)) >> []() {});

Btw: you should not use native sleep functions, because they are blocking. That's only ok if you spawned your Actor with the detached flag, because the Actor has its own thread in this case. But remember that your Actors usually share one thread pool and blocking function calls should be best avoided.

Wednesday, August 24, 2011

libcppa vs. Erlang vs. Scala Performance

I recently found a Scala Actor vs. Erlang Performance test on github.

The test benchmarks the speed of message handling and I decided to port it to libcppa. This is the full source code for a sequential-send-version (sending 3,000,000 messages):

#include <iostream>
#include "cppa/cppa.hpp"
#include <boost/progress.hpp>

using std::cout;
using std::endl;
using boost::timer;
using namespace cppa;

void counter_actor()
{
    long count = 0;
    receive_loop
    (   
        on<atom("Get")>() >> [&]()
        {   
            reply(count);
            count = 0;
        },  
        on<atom("AddCount"), long>() >> [&](long val)
        {   
            count += val;
        }   
    );  
}

long the_test(int msg_count)
{
    constexpr long val = 100;
    auto counter = spawn(counter_actor);
    for (int i = 0; i < msg_count; ++i)
    {   
        send(counter, atom("AddCount"), val);
    }   
    send(counter, atom("Get"));
    long result = 0;
    receive
    (   
        on<long>() >> [&](long value)
        {   
            result = value;
        }   
    );  
    send(counter, atom(":Exit"), exit_reason::user_defined);
    return result;
}

void run_test(int msg_count)
{
    timer t0; 
    long count = the_test(msg_count);
    auto elapsed = t0.elapsed();
    cout << "Count is " << count << endl
         << "Test took " << elapsed << " seconds" << endl
         << "Throughput = " << (msg_count / elapsed)
                            << " per sec" << endl;
}

int main()
{
    run_test(3000000);
    await_all_others_done();
    return 0;
}


Results (average value of 5 runs)


LanguageTime(s)Throughput (msg/s)
Erlang OTP~9~333,333.333
Erlang "bare receive"~7~428,571.429
Scala 2.9~9~333,333.333
libcppa~12~250,000


System: 2.66 GHz Intel Core i7 (dual core); Mac OS 10.7.1

The source code can be found in the folder gen_server on github.


Note: This benchmark results are outdated. The current version of libcppa runs the gen_server benchmark in ~5.5 seconds.

Please see the mixed scenario post for a more detailed benchmark.

Monday, July 25, 2011

Receive Loops

The usual way to implement an actor is a list of patterns (an expected message format and a corresponding callback) in a loop. A naive approach might look like this:

for (;;)
{
  receive
  (
    on<int>() >> [](int value) { ... },
    on<double>() >> [](double value) { ... },
    ...
  );
}

That's a naive approach, because you define your pattern inside of the loop. Instead of re-using the pattern, you create new objects in each iteration.

This example re-uses the pattern, but the code is more verbose and breaks locality; the pattern is no longer part of the loop and you have to scroll up to see where the pattern is defined.

auto pattern =
(
  on<int>() >> [](int value) { ... },
  on<double>() >> [](double value) { ... },
  ...
);
for (;;)
{
  receive(pattern);
}

Note the syntax auto pattern = ()! Your compiler won't accept the code if you miss the brackets, because he's thinking the comma separates two declarations.

To provide a more elegant, and less error-prone, way to express loops, there are three "predefined" loops in libcppa:


  1. receive_loop([PATTERN]);
     
  2. receive_while([LAMBDA -> bool])([PATTERN]);
     
  3. do_receive([PATTERN]).until([LAMBDA -> bool]);


The first one loops forever. The other two use lambda expressions for the loop condition. C++ doesn't have a nicer way to pass a block of code (or even one simple statement) to a function, therefore it's more verbose than the built-in while() loop (yes, Scala does a much better job here...).

Anyway, this is a short but complete example program that uses two of the receive loops:

#include <iostream>
#include "cppa/cppa.hpp"

using std::cout;
using std::endl;
using namespace cppa;

void ping()
{
  receive_loop
  (
    on<atom("Pong"), int>() >> [](int value)
    {
      cout << "ping received { Pong, " << value << " }" << endl;
      reply(atom("Ping"), value + 1);
    }
  );
}

void pong(actor_ptr ping_actor)
{
  link(ping_actor);
  send(ping_actor, atom("Pong"), 0);
  int last_ping = 0;
  receive_while([&]() { return last_ping < 9; })
  (
    on<atom("Ping"), int>() >> [&](int value)
    {
      cout << "pong received { Ping, " << value << " }" << endl;
      last_ping = value;
      reply(atom("Pong"), value + 1);
    }
  );
  // non-normal exit reason forces ping_actor to quit
  quit(exit_reason::user_defined);
}

int main()
{
  spawn(pong, spawn(ping));
  await_all_others_done();
  return 0;
}
And this is the output of the example program:
ping received { Pong, 0 }
pong received { Ping, 1 }
ping received { Pong, 2 }
pong received { Ping, 3 }
ping received { Pong, 4 }
pong received { Ping, 5 }
ping received { Pong, 6 }
pong received { Ping, 7 }
ping received { Pong, 8 }
pong received { Ping, 9 }
ping received { Pong, 10 }

Monday, June 6, 2011

Atoms

I wrote about the problem with atoms a while ago. It's an important topic for an actor library because there are basically two ways to add semantics to messages (other than "this is an integer!"): always send "plain" tuples between actors which are tagged by atoms (or something similar) or create a tuple class per "message type". This is what the Scala-guys do with their case classes. And this is what I did in acedia. But the acedia implementation (or should I say emulation?) depended on massive use of the C preprocessor (which is almost ever a good indicator for a bad design).

Case classes are nice if they're part of your language and type system, but not if your host language is C++. So I'm happy to finally have a satisfying atom implementation for libcppa (although it still has a limitation).

Let me once again start with the (Erlang-) printer example:

Printer = spawn(...),
Printer ! { do_print, "hello world" },
...

This is the libcppa equivalent with the new atoms-implementation:

auto printer = spawn(printer_loop);
printer << make_tuple(atom("do_print"), "hello world");
The implementation of printer_loop might look like this:
void printer_loop()
{
    auto cb = on<atom("do_print"),std::string>() >> [](const std::string& str){
        cout << str << endl;
    };
    for (;;) receive(cb);
}
atom() is a consexpr function that maps the given string to an integer value. This value can be used as template parameter and only this value is used to compare two atoms (no string comparison at runtime!). This mapping is bijective (because the values must not collide), but it allows only a string length of 10 or less (which is the mentioned limitation). If you're interested in the "dirty details" checkout the sources and read cppa/detail/atom_val.hpp. :)

Of course, the atom() function will be replaced by a user-defined string literal as soon as GCC supports them.

Thursday, May 26, 2011

acedia and libcppa

If you're interested in actor programming and C++, you might have found acedia on sourceforge. And maybe you've noted, that I'm the author of both acedia (which is dead now) and libcppa. But why should one "throw away" such an amount of work?

First: acedia is an "intrusive library". If you're looking through the acedia examples, you'll note that you have to write a lot of gluecode (derive from acedia::Actor, override act(), use stuff like .unbox(), etc.). It's not convenient and you have to write too much code for simple tasks. Second: some parts of the library needed a redesign (e.g. serialization and MetaClass). Third: acedia had too much preprocessor magic and horrible code smells (basically because of the lack of variadic templates: send() had 10 overloads, Tuple had 10 template specializations, etc.).

So I decided to start from scratch again (and move to C++0x). Just compare the acedia pong example with this equivalent libcppa code:

using namespace cppa;
using namespace cppa::placeholders;

void pong(actor_ptr ping_actor)
{
  link(ping_actor);
  bool done = false;
  // kickoff
  send(ping_actor, 0);
  // receive loop
  do_receive
  (
    on<int>().when(_x1 > 9) >> [&]() {
      done = true;
    },
    on<int>() >> [](int v) {
      reply(v+1);
    }
  )
  .until(gref(done));
  // terminate with non-normal exit reason
  // to force linked ping actor to quit
  quit(user_defined_exit_reason);
}

void ping()
{
  receive_loop
  (
    on<int>() >> [](int v)
    {
      reply(v+1);
    }
  );
}

int main()
{
    spawn(pong, spawn(ping));
    await_all_others_done();
    return 0;
}

libcppa is designed as an internal DSL (btw: that's the reason why I switched from CamelCase to the STL coding conventions). That makes the code much clearer, less verbose and you don't have to hack around with stuff like MockActor. Everything (even main) is implicitly converted to an actor if needed without a single line of extra-code.

Sunday, April 24, 2011

Unrestricted Union

What is an "unrestricted" union? Well, unions in C++ always suffered from being an old C remains. Unions are nice as long as you put just pointers and integers into one single memory block. But the compiler (according to the C++03 ISO standard) rejects every type that provides a constructor and/or destructor. That's mainly because the "used type" of a union is only known at runtime. Thus, the compiler can't decide when to run a constructor and just rejects every non-trivial type.

Let's look at a short example:

struct foo {
    enum { is_cstr, is_str } flag;
    union {
        const char* cstr;
        std::string str;
    };
    foo() { /* oops? */ }
};

foo holds either a const char* or a std::string object. The actual "type" of foo is stored in flag. But how should the compiler know about your type flag? He just sees: "Ah, there is a member called str that needs construction / deconstruction; but it's in a union! Calling any constructor and/or deconstructor might lead to memory corruption, because the memory location of str is shared".

Because the compiler can't initialize the union (without possible memory corruptions), previous C++ standards completely prohibited unions with non-trivial types. This rule changed now. The compiler still can't initialize / uninitialize your union members. But you can!

Let's take a look at a complete example that shows how:

// placement new
#include <new>
// std::string
#include <string>
// assert(...)
#include <cassert>

class foo
{

    typedef std::string str_type;

    enum { is_cstr, is_str } flag;

    union {
        const char* cstr;
        str_type str;
    };

 public:

    foo(const char* content) : flag(is_cstr)
    {
        cstr = content;
    }

    foo(const std::string& content) : flag(is_str)
    {
        // call the copy constructor of str (placement new)
        new (&str) str_type(content);
    }

    foo(std::string&& content) : flag(is_str)
    {
        // call the move constructor of str (placement new)
        new (&str) str_type(std::move(content));
    }

    // note: operator= is implicitly deleted, because
    // the compiler can't handle the unrestricted union

    const char* c_str() const
    {
        return (flag == is_cstr) ? cstr : str.c_str();
    }

    bool is_std_string() const { return flag == is_str; }

    ~foo()
    {
        // call destructor of str if needed
        if (flag == is_str) str.~str_type();
    }

};

int main(int, char**)
{
    foo f1("Hello World");
    assert(f1.is_std_string() == false);
    foo f2(std::string("Hello World"));
    assert(f2.is_std_string() == true);
    return 0;
}

Note: You need GCC >= 4.6 to compile and run this example

foo is an immutable wrapper around either a const char* or a std::string. Instead of an enum flag you might use a boolean flag, because there are only two options (cstring or std::string).

Ok, this example is pretty easy and straightforward. But how to keep track of a larger union? One need to write a switch/case for almost everything, right? Assignment and move assignment, copy and move constructor, all of them need to evaluate the runtime type in a switch case, right? You might argue that the effort to implement an unrestricted union is exponential. Well, not necessarily. Metaprogramming is the right tool to let the compiler do the work. To illustrate this, I'll implement another class with an unrestricted union: any_string. This class should be able to hold every kind of string: std::string, std::u16string or std::u32string.

// forward declartions
class any_string;

template<typename T>
T any_string_cast(any_string&);

// the runtime type information of any_string
enum class any_string_type { nothing, u8, u16, u32 };

Before I start implementing any_string, we need some metaprogramming utility first:

// achieves static call dispatch (Int-To-Type idiom)
template<any_string_type Value>
struct scdt // = static call dispatch token
{
    static const any_string_type value = Value;
};

// if (IfStmt == true) type = T; else type = Else::type;
template<bool IfStmt, any_string_type T, class Else>
struct if_else_c
{
    static const any_string_type type = T;
};

template<any_string_type T, class Else>
struct if_else_c<false, T, Else>
{
    static const any_string_type type = Else::type;
};

// if (IfStmt::value == true) type = T; else type = Else::type;
template<class IfStmt, any_string_type T, class Else>
struct if_else : if_else_c<IfStmt::value, T, Else> { };

// wraps T (needed to pass T as an Else to if_else)
template<any_string_type T>
struct wrap { static const any_string_type type = T; };

You'll see later what's the static call dispatch token is good for. But with the if_else template, we're able to define a mapping between the native string types and any_string_type:

template<typename T>
struct scdt_map
      // if (T is convertible to std::string) type = u8
    : if_else<std::is_convertible<T, std::string>, any_string_type::u8,
      // else if (T is convertible to std::u16string) type = u16
      if_else<std::is_convertible<T, std::u16string>, any_string_type::u16,
      // else if (T is convertible to std::u32string) type = u32
      if_else<std::is_convertible<T, std::u32string>, any_string_type::u32,
      // else type = nothing
      wrap<any_string_type::nothing>>>>
{
};

Everything that's convertible to std::string is mapped to any_string_type::u8. Same is true for std::u16string => any_string_type::u16 and std::u32string => any_string_type::u32. If T is not convertible to any string type, then T is mapped to any_string_type::nothing.

Ok, I'll show the full definition of any_string first, before I'll explain the implementation details:

class any_string
{

    // needs access to get()
    template<typename T>
    friend T any_string_cast(any_string&);

    // safe some typing
    static const any_string_type u8_t = any_string_type::u8;
    static const any_string_type u16_t = any_string_type::u16;
    static const any_string_type u32_t = any_string_type::u32;
    static const any_string_type nothing_t = any_string_type::nothing;

    // actual type
    any_string_type flag;

    // unrestricted union
    union
    {
        std::string str8;
        std::u16string str16;
        std::u32string str32;
    };

    // access to members via type token (static call dispatch)
    inline std::string&    get(scdt<u8_t>)  { return str8;  }
    inline std::u16string& get(scdt<u16_t>) { return str16; }
    inline std::u32string& get(scdt<u32_t>) { return str32; }

    // const access to members
    inline const std::string&    get(scdt<u8_t>)  const { return str8;  }
    inline const std::u16string& get(scdt<u16_t>) const { return str16; }
    inline const std::u32string& get(scdt<u32_t>) const { return str32; }

    // invokes f(this, scdt<T>) with T = flag (the actual runtime type)
    template<typename Fun>
    void invoke(Fun&& f);

    // implements get(token).~string_type();
    struct destroyer
    {
        template<any_string_type Type>
        void operator()(any_string* self, scdt<Type> token) const;
    };

    // implements *this = other
    struct setter
    {
        const any_string& other;
        inline setter(const any_string& other_ref) : other(other_ref) { }
        template<any_string_type Type>
        inline void operator()(any_string* self, scdt<Type> token) const;
    };

    // implements *this == other
    struct comparator
    {
        const any_string& other;
        bool res;
        inline comparator(const any_string& oref) : other(oref), res(false) { }
        template<any_string_type Type>
        inline void operator()(any_string* self, scdt<Type> token);
    };

    // implements *this = std::move(other)
    struct mover
    {
        any_string& other;
        inline mover(any_string& other_ref) : other(other_ref) { }
        template<any_string_type Type>
        void operator()(any_string* self, scdt<Type> token) const;
    };

    // postcondition: flag == nothing_t
    void destroy();

    // postcondition: flag == Type
    template<any_string_type Type, class V>
    void set(V&& value);

 public:

    // note: all union members are left unitialized
    inline any_string() : flag(nothing_t) { }

    template<class Str>
    any_string(Str&& str);

    any_string(any_string&& other);

    any_string(const any_string& other);

    template<class Str>
    any_string& operator=(Str&& str);

    any_string& operator=(any_string&& other);

    any_string& operator=(const any_string& other);

    bool operator==(const any_string& other);

    template<class Str>
    bool operator==(const Str& str);

    template<class Str>
    inline bool operator!=(const Str& str) { return !(*this == str); }

    inline ~any_string() { destroy(); }

    inline any_string_type type() const { return flag; }

};

any_string supports all kinds of assignment, construction and comparison. But we don't need to write a switch/case for each of those member functions. The trick is done by invoke. This little helper function evaluates the runtime type of any_string and performs a static call dispatch according to the found type. In fact, invoke encapsulates the switch/case:

template<typename Fun>
void any_string::invoke(Fun&& f)
{
    switch (flag)
    {
     case u8_t:  f(this, scdt<u8_t>());  break;
     case u16_t: f(this, scdt<u16_t>()); break;
     case u32_t: f(this, scdt<u32_t>()); break;
     default: break;
    }
}

You might noted the nested classes in the any_string definition. Those are functors that implement operations on any_string. The first functor is called destroyer:

template<any_string_type Type>
void any_string::destroyer::operator()(any_string* self, scdt<Type> token) const
{
    typedef typename std::remove_reference<decltype(self->get(token))>::type
            str_type;
    self->get(token).~str_type();
}

invoke calls destroyer::operator() with a type token and the this pointer. get(token) returns a reference to the actual used string implementation from the union. To call the destructor, we also need correct type (str_type).

Now, the implementation of destroy() is trivial:

void any_string::destroy()
{
    invoke(destroyer());
    flag = nothing_t;    // reset flag
}

Since invoke() does nothing if flag == nothing_t, we don't have to fear any side effects if destroy() is called on an "empty" any_string.

The implementation of the other functors is also straightforward:

template<any_string_type Type>
void any_string::setter::operator()(any_string* self, scdt<Type> token) const
{
    self->set<Type>(other.get(token));
}

template<any_string_type Type>
void any_string::mover::operator()(any_string* self, scdt<Type> token) const
{
    self->set<Type>(std::move(other.get(token)));
}

template<any_string_type Type>
void any_string::comparator::operator()(any_string* self, scdt<Type> token)
{
    if (other.flag == Type)
    {
        res = self->get(token) == other.get(token);
    }
    // else: res = false (default)
}

The last "tricky" function is set:

template<any_string_type Type, class V>
void any_string::set(V&& value)
{
    scdt<Type> token;
    if (flag == Type)
    {
        // type matches, use operator=();
        // calls automatically either copy or move assignment operator
        // ("perfect forwarding")
        get(token) = std::forward<V>(value);
    }
    else
    {
        // destroy old value
        destroy();
        // get the "real" string type from Type
        typedef typename std::remove_reference<decltype(get(token))>::type
                str_type;
        // use placement new and call constructor "by hand";
        // calls automatically either copy or move constructor
        // ("perfect forwarding")
        new (&get(token)) str_type(std::forward<V>(value));
        flag = Type;
    }
}

Thanks to the "perfect forwarding rule" of C++, one single implementation covers all possible cases (const and non-cosnt lvalue reference, movable rvalue, etc.).

The same is true for the template constructor, operator= and operator== of any_string:

template<class Str>
any_string::any_string(Str&& str) : flag(nothing_t)
{
    static_assert(scdt_map<Str>::type != any_string_type::nothing,
                  "T is not a valid string literal");
    set<scdt_map<Str>::type>(std::forward<Str>(str));
}

template<class Str>
any_string& any_string::operator=(Str&& str)
{
    static_assert(scdt_map<Str>::type != any_string_type::nothing,
                  "T is not a valid string literal");
    set<scdt_map<Str>::type>(std::forward<Str>(str));
    return *this;
}

template<class Str>
bool any_string::operator==(const Str& str)
{
    static_assert(scdt_map<Str>::type != any_string_type::nothing,
                  "T is not a valid string literal");
    if (flag == scdt_map<Str>::type)
    {
        scdt<scdt_map<Str>::type> token;
        return get(token) == str;
    }
    else
    {
        return false;
    }
}

They cover any kind of string literal and lvalue/rvalue string references. The static_assertion also gives a human-readable error message if Str isn't a string at all.

The implementations of the remaining member functions are no-brainers:

any_string::any_string(any_string&& other) : flag(nothing_t)
{
    invoke(mover(other));
}

any_string::any_string(const any_string& other) : flag(nothing_t)
{
    invoke(setter(other));
}

any_string& any_string::operator=(any_string&& other)
{
    invoke(mover(other));
    return *this;
}

any_string& any_string::operator=(const any_string& other)
{
    invoke(setter(other));
    return *this;
}

bool any_string::operator==(const any_string& other)
{
    comparator cmp(other);
    invoke(cmp);
    return cmp.res;
}

The last missing part of our implementation is any_string_cast. It provides an explicit interface to access the "real" type of an any_string object. Again, before we start implementing any_string_cast, we need a few lines metaprogramming utility:

// map T to the corresponding any_string_type values
template<typename T>
struct any_string_cast_trait
        : if_else<std::is_same<T, std::string>, any_string_type::u8,
          if_else<std::is_same<T, std::u16string>, any_string_type::u16,
          if_else<std::is_same<T, std::u32string>, any_string_type::u32,
          wrap<any_string_type::nothing>>>> { };

// match references
template<typename T>
struct any_string_cast_trait<T&> : any_string_cast_trait<T> { };

// match const references
template<typename T>
struct any_string_cast_trait<const T&> : any_string_cast_trait<T> { };

Since any_string_cast is a friend of any_string it has access to any_string::get. The any_string_cast_trait utility class makes any_string_cast a no-brainer (if we wouldn't care about error handling, it would be a one-liner), because it only needs to pass any_string_cast_trait::type as a token to any_string::get():

template<typename T>
T any_string_cast(any_string& astr)
{
    typedef any_string_cast_trait<T> trait;
    static_assert(trait::type != any_string_type::nothing,
                  "T must be one of std::string, "
                  "std::u16string or std::u32string");
    // type checking
    if (astr.flag != trait::type) throw std::bad_cast();
    return astr.get(scdt<trait::type>());
}

That's it. Of course, the invoke trick doesn't reduce the runtime overhead of the union. But it encapsulates the switch/case statement and utilizes static call dispatching to minimize implementation overhead. Adding a new type to any_string (e.g. std::wstring or QString) would require only a few lines of code.

any_string in use:

any_string a1 = "hello world";
any_string a2 = "hello world";
assert(a1 == a2);
assert(a1 == "hello world");
assert(a1 == std::string("hello world"));
assert(a1 != u"hello world");
assert(a1 != U"hello world");
a2 = u"hello world";
assert(a1 != a2);
cout << any_string_cast<std::string&>(a1) << endl;
If you're interested in the full source code: download here (google sites link). Compile with g++ -std=c++0x any_string.cpp (GCC >= 4.6 is required).

Monday, April 11, 2011

Mailbox Part 1

Actors communicate (only) via message passing. This makes the mailbox implementation critical for the overall system performance.

The mailbox is a FIFO queue, owned by an actor. Any number of other actors (writers) enqueue new messages in parallel but only the owning actor (reader) is allowed to dequeue a message. Thus, a mailbox is a Single-Reader-Many-Writer queue.

The first (and most important) thing we need from a mailbox is speed / scalability. Today I'll represent the three most promising algorithms I've testet so far. There is a large number of concurrent linked list and queue algorithms out there, but a lot of them require exotic atomic operations not available on mainstream hardware (e.g. double-width compare-and-swap [DCAS] or load-link/store-conditional).

  1. Spinlock Queue

    This implementation is based on an article of Herb Sutter. He adapted the algorithm from a paper of M. Michael and M. Scott (second algorithm in the PDF). The original implementation uses two spinlocks (one guards the head to synchronize readers and the other one guards the tail to synchronize writers). Because we don't need to synchronize readers, the testet version uses only one spinlock.

  2. Lock-Free Queue

    This is a variation of the first algorithm. But instead of locking tail, we just move the tail atomically and set the next pointer of the predecessor afterwards on success.

    Setting tail first leads to the situation, that the reader might see an empty queue although there are new elements in the list. If the writer gets suspended right after the CAS operation, the reader also has to wait for the writer to continue (and further enqueue operations are hidden too).

    Again, there are smarter algorithms in theory. But none of them are available on today's hardware. That's mainly because you can't solve the ABA Problem without some kind of DCAS operation. And you have to solve the ABA Problem if you want to do something "clever" with A. This variation doesn't need to solve the ABA Problem, because we're only replacing A (the tail). And it doesn't bother us if we're really replacing the A that we saw or a 'new' A.

  3. Cached Stack

    A queue always needs at least two (critical) writes to enqueue a new element:
    void queue::push(T* new_element) {
      atomic { // the holy grail of concurrency
        tail->next = new_element;
        tail = new_element;
      }
    }
    A stack only needs one compare-and-swap operation for push. That's the motivation to the third algorithm. The concurrent part of the queue is a stack. That makes the push operation easy, safe and fast (ST = Stack Tail):



    The tricky part is the pop operation, because the stack is LIFO ordered. An O(N) pop (traverse always to the last stack element) is unacceptable. The solution is to introduce a singly linked list as a cache (CH = Cache Head):



    The average case of pop is now O(1) (as long as there are still elements cached), but the worst case is O(N). That makes the runtime of pop somewhat unpredictable, but push is extremely fast.


Performance

The algorithms were testet on a 2.66 GHz Intel i7 (DualCore). The x-axis shows the total number of messages in million (each producer is sending $TOTAL / $NUMBER_PRODUCERS messages).



The next benchmark will run on a real multicore machine (>=12 cores) to test the implementations with different levels of (hardware) concurrency.

I'm still looking for new (on intel machines available) algorithms, although cached stack looks pretty good in this first benchmark.

Friday, March 4, 2011

The Problem with Atoms

First: What is an "Atom" (and what is it good for)?
In Erlang, an Atom is a special kind of constant. A constant that could be used without defining it elsewhere and that's identified by the syntax (all lower-case or enclosed in single quotes):

Printer = spawn(...),
Printer ! { do_print, "hello world" },
...

The spawned actor now might match on the atom do_print:

receive
    { do_print, What } -> ...
end

Atoms are a way to "tag", resp. to structure tuples / messages. And this approach is far better than abuse some integer values ("enums").

My last post was about Pattern Matching in C++. We do match on the types first, so Atoms should have distinct types. Those types should be declared while they are used. That's what templates are for. But what's the template parameter(s)? const char* is obviously a bad choice. String literals could not be used as template parameters and the workaround (declare a string with external linkage) does not solve the problem of distinct types. But variadic templates (char...) do.

If we translate the example above, it might look like this:

template<char... Name>
class atom { ... };

...

auto printer = spawn(...);
printer.send(atom<"do_print">, "hello world");

...

receive(on<atom<"do_print">, std::string>() >> ... );

Well... That's not valid C++(0x). Unfortunately!
Although I'm not the only one who would love to see this to become valid C++. But it's not in the current working draft.

By the way, this is how it would be accepted by the compiler:

auto printer = spawn(...);
printer.send(atom<'d','o','_','p','r','i','n','t'>, "hello world");

Would you disagree if I say this is ridiculous? I guess not.
But what's the next best we can do? Maybe a user defined suffix:

template<char... Name>
atom<Name...> operator "" _atom();

auto printer = spawn(...);
printer.send(do_print_atom, "hello world");

...

receive(on<decltype(do_print_atom), std::string>() >> ... );

It's far better then giving a list of characters, but decltype(...) is annoying (and verbose). And I couldn't test it, because user-defined literals are not implemented by any compiler by now (GCC, clang, VisualStudio, Comeau, ...).

I'm afraid that I'm forced to skip atoms until user-defined literals are available (or my favorite solution becomes legal in a future draft version).

Pattern Matching

The following example illustrates how libcppa actually looks like:

using namespace cppa;

void slogger()
{
    receive(on<int>() >> [](int value) {
        reply((value * 20) + 2);
    });
}

int main(int, char**)
{
    auto sl = spawn(slogger);
    send(sl, 2);
    receive(on<int>() >> [](int value) {
        // prints 42
        cout << value << endl;
    });
    
    return 0;
}

Although this is a very short snippet, it shows two main goals of libcppa:

  1. The library should look like an (internal) DSL and in particular should not force the user to write any glue code (e.g. derive a class actor and override virtual functions, etc.).
  2. Any context (even main) is silently converted into an actor if needed.
Messages are matched against a pattern. In Erlang, those patterns may ask for types in the guard. In a strong typed language like C++ it makes obviously more sense to match the types first.

Pattern are defined as on<T0, T1, ..., TN>() >> X. Where T0...TN are the types and X is a Callable (function or functor) that should be invoked on a match.

One more snippet that plays a bit more with patterns:

#include <iostream>
#include "cppa/cppa.hpp"

using std::cout;
using std::endl;
using namespace cppa;

void foo()
{
    auto patterns =
    (
        on<int, anything, int>() >> [](int v1, int v2)
        {
            cout << "pattern1 { " << v1 << ", " << v2 << " }\n";
        },
        on<std::string>() >> [](const std::string& str)
        {
            cout << "pattern2 { \"" << str << "\" }\n";
        },
        on(std::string("1"), any_vals) >> []()
        {
            cout << "pattern3 { \"1\" }\n";
        },
        on(1, val<std::string>(), any_vals) >> [](const std::string& str)
        {
            cout << "pattern4 { \" 1, " << str << "\" }\n";
        }
    );
    // receive 4 messages
    for (int i = 0; i < 4; ++i) receive(patterns);
}

int main(int, char**)
{
    //auto f = spawn(foo);
    auto f = spawn(foo);
    // prints: pattern2 { "hello foo" }
    send(f, "hello foo");
    // prints: pattern3 { "1" }
    send(f, "1", 2, 3);
    // prints: pattern1 { 1, 3 }
    send(f, 1, "2", 3);
    //  prints: pattern4 { "2" }
    send(f, 1, "2", "3");
    await_all_others_done();
    return 0;
}

There are basically two ways to use on():

The first one is to match for types only. In this case, you don't pass any arguments to the function and specify your pattern via the template parameter list. See pattern 1 & 2 for an example.
anything is a wildcard expression matching any number - zero or more - of elements.

Passing other pointer or reference types will cause a compile-time error.

The second way is to match for values. See pattern 3 & 4 for an example. The global constexpr value any_vals has the equivalent meaning of anything in the template parameter list. If you want to match for a type only, use the val() template function at that position.

Why actors?

This blog is - mainly - about the development of libcppa (C++ actor library). But why should we use actors in C++ instead of a signal / slot implementation?
Why bother about "yet another way of decoupling"?

Well, the actor model is all about concurrency. Concurrency by design rather than concurrency by implementation. Mutexes, semaphores and thread primitives are the wrong level of abstraction for programming many-core systems. Think about machines with 20 or more cores (this is the future and there's no way to get the free lunch back!). Writing both thread-safe and scalable code using low-level primitives is challenging and error-prone. You should not think about how to make use of those cores. You should use a level of abstraction that allows your runtime system to make use of as many cores as possible. This is the point of libcppa: raise the level of abstraction in C++. Furthermore, the actor model unifies concurrency and distribution. Since actors communicate by network-transparent, asynchronous message passing, one can freely distribute actors among any number of machines.

Of course, the actor model is not the only high level abstraction for concurrency. There is libdispatch, (software) transactional memory and agents or many other approaches such as concurrent channels (e.g. in googles go) or intel blocks. However, the actor model is the only computation model that applies to both concurrency and distribution. Available real-world applications of the actor model (such as Erlang or Scala libraries) have shown that the actor model leads to scalable applications with a (usually) good software design consisting of small, easy-to-understand and easy-to-test software components.

The actor model is very easy to understand and to explain. Everyone knows what a message is and how a mailbox works. All you have to explain is the syntax to get a message from actor Alice to actor Bob and how Bob could read and compute messages from its mailbox. And there is no reason why C++ should not provide an implementation of the actor model to ease development of concurrent and distributed systems.