Sunday, April 24, 2011

Unrestricted Union

What is an "unrestricted" union? Well, unions in C++ always suffered from being an old C remains. Unions are nice as long as you put just pointers and integers into one single memory block. But the compiler (according to the C++03 ISO standard) rejects every type that provides a constructor and/or destructor. That's mainly because the "used type" of a union is only known at runtime. Thus, the compiler can't decide when to run a constructor and just rejects every non-trivial type.

Let's look at a short example:

struct foo {
    enum { is_cstr, is_str } flag;
    union {
        const char* cstr;
        std::string str;
    };
    foo() { /* oops? */ }
};

foo holds either a const char* or a std::string object. The actual "type" of foo is stored in flag. But how should the compiler know about your type flag? He just sees: "Ah, there is a member called str that needs construction / deconstruction; but it's in a union! Calling any constructor and/or deconstructor might lead to memory corruption, because the memory location of str is shared".

Because the compiler can't initialize the union (without possible memory corruptions), previous C++ standards completely prohibited unions with non-trivial types. This rule changed now. The compiler still can't initialize / uninitialize your union members. But you can!

Let's take a look at a complete example that shows how:

// placement new
#include <new>
// std::string
#include <string>
// assert(...)
#include <cassert>

class foo
{

    typedef std::string str_type;

    enum { is_cstr, is_str } flag;

    union {
        const char* cstr;
        str_type str;
    };

 public:

    foo(const char* content) : flag(is_cstr)
    {
        cstr = content;
    }

    foo(const std::string& content) : flag(is_str)
    {
        // call the copy constructor of str (placement new)
        new (&str) str_type(content);
    }

    foo(std::string&& content) : flag(is_str)
    {
        // call the move constructor of str (placement new)
        new (&str) str_type(std::move(content));
    }

    // note: operator= is implicitly deleted, because
    // the compiler can't handle the unrestricted union

    const char* c_str() const
    {
        return (flag == is_cstr) ? cstr : str.c_str();
    }

    bool is_std_string() const { return flag == is_str; }

    ~foo()
    {
        // call destructor of str if needed
        if (flag == is_str) str.~str_type();
    }

};

int main(int, char**)
{
    foo f1("Hello World");
    assert(f1.is_std_string() == false);
    foo f2(std::string("Hello World"));
    assert(f2.is_std_string() == true);
    return 0;
}

Note: You need GCC >= 4.6 to compile and run this example

foo is an immutable wrapper around either a const char* or a std::string. Instead of an enum flag you might use a boolean flag, because there are only two options (cstring or std::string).

Ok, this example is pretty easy and straightforward. But how to keep track of a larger union? One need to write a switch/case for almost everything, right? Assignment and move assignment, copy and move constructor, all of them need to evaluate the runtime type in a switch case, right? You might argue that the effort to implement an unrestricted union is exponential. Well, not necessarily. Metaprogramming is the right tool to let the compiler do the work. To illustrate this, I'll implement another class with an unrestricted union: any_string. This class should be able to hold every kind of string: std::string, std::u16string or std::u32string.

// forward declartions
class any_string;

template<typename T>
T any_string_cast(any_string&);

// the runtime type information of any_string
enum class any_string_type { nothing, u8, u16, u32 };

Before I start implementing any_string, we need some metaprogramming utility first:

// achieves static call dispatch (Int-To-Type idiom)
template<any_string_type Value>
struct scdt // = static call dispatch token
{
    static const any_string_type value = Value;
};

// if (IfStmt == true) type = T; else type = Else::type;
template<bool IfStmt, any_string_type T, class Else>
struct if_else_c
{
    static const any_string_type type = T;
};

template<any_string_type T, class Else>
struct if_else_c<false, T, Else>
{
    static const any_string_type type = Else::type;
};

// if (IfStmt::value == true) type = T; else type = Else::type;
template<class IfStmt, any_string_type T, class Else>
struct if_else : if_else_c<IfStmt::value, T, Else> { };

// wraps T (needed to pass T as an Else to if_else)
template<any_string_type T>
struct wrap { static const any_string_type type = T; };

You'll see later what's the static call dispatch token is good for. But with the if_else template, we're able to define a mapping between the native string types and any_string_type:

template<typename T>
struct scdt_map
      // if (T is convertible to std::string) type = u8
    : if_else<std::is_convertible<T, std::string>, any_string_type::u8,
      // else if (T is convertible to std::u16string) type = u16
      if_else<std::is_convertible<T, std::u16string>, any_string_type::u16,
      // else if (T is convertible to std::u32string) type = u32
      if_else<std::is_convertible<T, std::u32string>, any_string_type::u32,
      // else type = nothing
      wrap<any_string_type::nothing>>>>
{
};

Everything that's convertible to std::string is mapped to any_string_type::u8. Same is true for std::u16string => any_string_type::u16 and std::u32string => any_string_type::u32. If T is not convertible to any string type, then T is mapped to any_string_type::nothing.

Ok, I'll show the full definition of any_string first, before I'll explain the implementation details:

class any_string
{

    // needs access to get()
    template<typename T>
    friend T any_string_cast(any_string&);

    // safe some typing
    static const any_string_type u8_t = any_string_type::u8;
    static const any_string_type u16_t = any_string_type::u16;
    static const any_string_type u32_t = any_string_type::u32;
    static const any_string_type nothing_t = any_string_type::nothing;

    // actual type
    any_string_type flag;

    // unrestricted union
    union
    {
        std::string str8;
        std::u16string str16;
        std::u32string str32;
    };

    // access to members via type token (static call dispatch)
    inline std::string&    get(scdt<u8_t>)  { return str8;  }
    inline std::u16string& get(scdt<u16_t>) { return str16; }
    inline std::u32string& get(scdt<u32_t>) { return str32; }

    // const access to members
    inline const std::string&    get(scdt<u8_t>)  const { return str8;  }
    inline const std::u16string& get(scdt<u16_t>) const { return str16; }
    inline const std::u32string& get(scdt<u32_t>) const { return str32; }

    // invokes f(this, scdt<T>) with T = flag (the actual runtime type)
    template<typename Fun>
    void invoke(Fun&& f);

    // implements get(token).~string_type();
    struct destroyer
    {
        template<any_string_type Type>
        void operator()(any_string* self, scdt<Type> token) const;
    };

    // implements *this = other
    struct setter
    {
        const any_string& other;
        inline setter(const any_string& other_ref) : other(other_ref) { }
        template<any_string_type Type>
        inline void operator()(any_string* self, scdt<Type> token) const;
    };

    // implements *this == other
    struct comparator
    {
        const any_string& other;
        bool res;
        inline comparator(const any_string& oref) : other(oref), res(false) { }
        template<any_string_type Type>
        inline void operator()(any_string* self, scdt<Type> token);
    };

    // implements *this = std::move(other)
    struct mover
    {
        any_string& other;
        inline mover(any_string& other_ref) : other(other_ref) { }
        template<any_string_type Type>
        void operator()(any_string* self, scdt<Type> token) const;
    };

    // postcondition: flag == nothing_t
    void destroy();

    // postcondition: flag == Type
    template<any_string_type Type, class V>
    void set(V&& value);

 public:

    // note: all union members are left unitialized
    inline any_string() : flag(nothing_t) { }

    template<class Str>
    any_string(Str&& str);

    any_string(any_string&& other);

    any_string(const any_string& other);

    template<class Str>
    any_string& operator=(Str&& str);

    any_string& operator=(any_string&& other);

    any_string& operator=(const any_string& other);

    bool operator==(const any_string& other);

    template<class Str>
    bool operator==(const Str& str);

    template<class Str>
    inline bool operator!=(const Str& str) { return !(*this == str); }

    inline ~any_string() { destroy(); }

    inline any_string_type type() const { return flag; }

};

any_string supports all kinds of assignment, construction and comparison. But we don't need to write a switch/case for each of those member functions. The trick is done by invoke. This little helper function evaluates the runtime type of any_string and performs a static call dispatch according to the found type. In fact, invoke encapsulates the switch/case:

template<typename Fun>
void any_string::invoke(Fun&& f)
{
    switch (flag)
    {
     case u8_t:  f(this, scdt<u8_t>());  break;
     case u16_t: f(this, scdt<u16_t>()); break;
     case u32_t: f(this, scdt<u32_t>()); break;
     default: break;
    }
}

You might noted the nested classes in the any_string definition. Those are functors that implement operations on any_string. The first functor is called destroyer:

template<any_string_type Type>
void any_string::destroyer::operator()(any_string* self, scdt<Type> token) const
{
    typedef typename std::remove_reference<decltype(self->get(token))>::type
            str_type;
    self->get(token).~str_type();
}

invoke calls destroyer::operator() with a type token and the this pointer. get(token) returns a reference to the actual used string implementation from the union. To call the destructor, we also need correct type (str_type).

Now, the implementation of destroy() is trivial:

void any_string::destroy()
{
    invoke(destroyer());
    flag = nothing_t;    // reset flag
}

Since invoke() does nothing if flag == nothing_t, we don't have to fear any side effects if destroy() is called on an "empty" any_string.

The implementation of the other functors is also straightforward:

template<any_string_type Type>
void any_string::setter::operator()(any_string* self, scdt<Type> token) const
{
    self->set<Type>(other.get(token));
}

template<any_string_type Type>
void any_string::mover::operator()(any_string* self, scdt<Type> token) const
{
    self->set<Type>(std::move(other.get(token)));
}

template<any_string_type Type>
void any_string::comparator::operator()(any_string* self, scdt<Type> token)
{
    if (other.flag == Type)
    {
        res = self->get(token) == other.get(token);
    }
    // else: res = false (default)
}

The last "tricky" function is set:

template<any_string_type Type, class V>
void any_string::set(V&& value)
{
    scdt<Type> token;
    if (flag == Type)
    {
        // type matches, use operator=();
        // calls automatically either copy or move assignment operator
        // ("perfect forwarding")
        get(token) = std::forward<V>(value);
    }
    else
    {
        // destroy old value
        destroy();
        // get the "real" string type from Type
        typedef typename std::remove_reference<decltype(get(token))>::type
                str_type;
        // use placement new and call constructor "by hand";
        // calls automatically either copy or move constructor
        // ("perfect forwarding")
        new (&get(token)) str_type(std::forward<V>(value));
        flag = Type;
    }
}

Thanks to the "perfect forwarding rule" of C++, one single implementation covers all possible cases (const and non-cosnt lvalue reference, movable rvalue, etc.).

The same is true for the template constructor, operator= and operator== of any_string:

template<class Str>
any_string::any_string(Str&& str) : flag(nothing_t)
{
    static_assert(scdt_map<Str>::type != any_string_type::nothing,
                  "T is not a valid string literal");
    set<scdt_map<Str>::type>(std::forward<Str>(str));
}

template<class Str>
any_string& any_string::operator=(Str&& str)
{
    static_assert(scdt_map<Str>::type != any_string_type::nothing,
                  "T is not a valid string literal");
    set<scdt_map<Str>::type>(std::forward<Str>(str));
    return *this;
}

template<class Str>
bool any_string::operator==(const Str& str)
{
    static_assert(scdt_map<Str>::type != any_string_type::nothing,
                  "T is not a valid string literal");
    if (flag == scdt_map<Str>::type)
    {
        scdt<scdt_map<Str>::type> token;
        return get(token) == str;
    }
    else
    {
        return false;
    }
}

They cover any kind of string literal and lvalue/rvalue string references. The static_assertion also gives a human-readable error message if Str isn't a string at all.

The implementations of the remaining member functions are no-brainers:

any_string::any_string(any_string&& other) : flag(nothing_t)
{
    invoke(mover(other));
}

any_string::any_string(const any_string& other) : flag(nothing_t)
{
    invoke(setter(other));
}

any_string& any_string::operator=(any_string&& other)
{
    invoke(mover(other));
    return *this;
}

any_string& any_string::operator=(const any_string& other)
{
    invoke(setter(other));
    return *this;
}

bool any_string::operator==(const any_string& other)
{
    comparator cmp(other);
    invoke(cmp);
    return cmp.res;
}

The last missing part of our implementation is any_string_cast. It provides an explicit interface to access the "real" type of an any_string object. Again, before we start implementing any_string_cast, we need a few lines metaprogramming utility:

// map T to the corresponding any_string_type values
template<typename T>
struct any_string_cast_trait
        : if_else<std::is_same<T, std::string>, any_string_type::u8,
          if_else<std::is_same<T, std::u16string>, any_string_type::u16,
          if_else<std::is_same<T, std::u32string>, any_string_type::u32,
          wrap<any_string_type::nothing>>>> { };

// match references
template<typename T>
struct any_string_cast_trait<T&> : any_string_cast_trait<T> { };

// match const references
template<typename T>
struct any_string_cast_trait<const T&> : any_string_cast_trait<T> { };

Since any_string_cast is a friend of any_string it has access to any_string::get. The any_string_cast_trait utility class makes any_string_cast a no-brainer (if we wouldn't care about error handling, it would be a one-liner), because it only needs to pass any_string_cast_trait::type as a token to any_string::get():

template<typename T>
T any_string_cast(any_string& astr)
{
    typedef any_string_cast_trait<T> trait;
    static_assert(trait::type != any_string_type::nothing,
                  "T must be one of std::string, "
                  "std::u16string or std::u32string");
    // type checking
    if (astr.flag != trait::type) throw std::bad_cast();
    return astr.get(scdt<trait::type>());
}

That's it. Of course, the invoke trick doesn't reduce the runtime overhead of the union. But it encapsulates the switch/case statement and utilizes static call dispatching to minimize implementation overhead. Adding a new type to any_string (e.g. std::wstring or QString) would require only a few lines of code.

any_string in use:

any_string a1 = "hello world";
any_string a2 = "hello world";
assert(a1 == a2);
assert(a1 == "hello world");
assert(a1 == std::string("hello world"));
assert(a1 != u"hello world");
assert(a1 != U"hello world");
a2 = u"hello world";
assert(a1 != a2);
cout << any_string_cast<std::string&>(a1) << endl;
If you're interested in the full source code: download here (google sites link). Compile with g++ -std=c++0x any_string.cpp (GCC >= 4.6 is required).

Monday, April 11, 2011

Mailbox Part 1

Actors communicate (only) via message passing. This makes the mailbox implementation critical for the overall system performance.

The mailbox is a FIFO queue, owned by an actor. Any number of other actors (writers) enqueue new messages in parallel but only the owning actor (reader) is allowed to dequeue a message. Thus, a mailbox is a Single-Reader-Many-Writer queue.

The first (and most important) thing we need from a mailbox is speed / scalability. Today I'll represent the three most promising algorithms I've testet so far. There is a large number of concurrent linked list and queue algorithms out there, but a lot of them require exotic atomic operations not available on mainstream hardware (e.g. double-width compare-and-swap [DCAS] or load-link/store-conditional).

  1. Spinlock Queue

    This implementation is based on an article of Herb Sutter. He adapted the algorithm from a paper of M. Michael and M. Scott (second algorithm in the PDF). The original implementation uses two spinlocks (one guards the head to synchronize readers and the other one guards the tail to synchronize writers). Because we don't need to synchronize readers, the testet version uses only one spinlock.

  2. Lock-Free Queue

    This is a variation of the first algorithm. But instead of locking tail, we just move the tail atomically and set the next pointer of the predecessor afterwards on success.

    Setting tail first leads to the situation, that the reader might see an empty queue although there are new elements in the list. If the writer gets suspended right after the CAS operation, the reader also has to wait for the writer to continue (and further enqueue operations are hidden too).

    Again, there are smarter algorithms in theory. But none of them are available on today's hardware. That's mainly because you can't solve the ABA Problem without some kind of DCAS operation. And you have to solve the ABA Problem if you want to do something "clever" with A. This variation doesn't need to solve the ABA Problem, because we're only replacing A (the tail). And it doesn't bother us if we're really replacing the A that we saw or a 'new' A.

  3. Cached Stack

    A queue always needs at least two (critical) writes to enqueue a new element:
    void queue::push(T* new_element) {
      atomic { // the holy grail of concurrency
        tail->next = new_element;
        tail = new_element;
      }
    }
    A stack only needs one compare-and-swap operation for push. That's the motivation to the third algorithm. The concurrent part of the queue is a stack. That makes the push operation easy, safe and fast (ST = Stack Tail):



    The tricky part is the pop operation, because the stack is LIFO ordered. An O(N) pop (traverse always to the last stack element) is unacceptable. The solution is to introduce a singly linked list as a cache (CH = Cache Head):



    The average case of pop is now O(1) (as long as there are still elements cached), but the worst case is O(N). That makes the runtime of pop somewhat unpredictable, but push is extremely fast.


Performance

The algorithms were testet on a 2.66 GHz Intel i7 (DualCore). The x-axis shows the total number of messages in million (each producer is sending $TOTAL / $NUMBER_PRODUCERS messages).



The next benchmark will run on a real multicore machine (>=12 cores) to test the implementations with different levels of (hardware) concurrency.

I'm still looking for new (on intel machines available) algorithms, although cached stack looks pretty good in this first benchmark.