Let's look at a short example:
struct foo { enum { is_cstr, is_str } flag; union { const char* cstr; std::string str; }; foo() { /* oops? */ } };
foo holds either a const char* or a std::string object. The actual "type" of foo is stored in flag. But how should the compiler know about your type flag? He just sees: "Ah, there is a member called str that needs construction / deconstruction; but it's in a union! Calling any constructor and/or deconstructor might lead to memory corruption, because the memory location of str is shared".
Because the compiler can't initialize the union (without possible memory corruptions), previous C++ standards completely prohibited unions with non-trivial types. This rule changed now. The compiler still can't initialize / uninitialize your union members. But you can!
Let's take a look at a complete example that shows how:
// placement new #include <new> // std::string #include <string> // assert(...) #include <cassert> class foo { typedef std::string str_type; enum { is_cstr, is_str } flag; union { const char* cstr; str_type str; }; public: foo(const char* content) : flag(is_cstr) { cstr = content; } foo(const std::string& content) : flag(is_str) { // call the copy constructor of str (placement new) new (&str) str_type(content); } foo(std::string&& content) : flag(is_str) { // call the move constructor of str (placement new) new (&str) str_type(std::move(content)); } // note: operator= is implicitly deleted, because // the compiler can't handle the unrestricted union const char* c_str() const { return (flag == is_cstr) ? cstr : str.c_str(); } bool is_std_string() const { return flag == is_str; } ~foo() { // call destructor of str if needed if (flag == is_str) str.~str_type(); } }; int main(int, char**) { foo f1("Hello World"); assert(f1.is_std_string() == false); foo f2(std::string("Hello World")); assert(f2.is_std_string() == true); return 0; }
Note: You need GCC >= 4.6 to compile and run this example
foo is an immutable wrapper around either a const char* or a std::string. Instead of an enum flag you might use a boolean flag, because there are only two options (cstring or std::string).
Ok, this example is pretty easy and straightforward. But how to keep track of a larger union? One need to write a switch/case for almost everything, right? Assignment and move assignment, copy and move constructor, all of them need to evaluate the runtime type in a switch case, right? You might argue that the effort to implement an unrestricted union is exponential. Well, not necessarily. Metaprogramming is the right tool to let the compiler do the work. To illustrate this, I'll implement another class with an unrestricted union: any_string. This class should be able to hold every kind of string: std::string, std::u16string or std::u32string.
// forward declartions class any_string; template<typename T> T any_string_cast(any_string&); // the runtime type information of any_string enum class any_string_type { nothing, u8, u16, u32 };
Before I start implementing any_string, we need some metaprogramming utility first:
// achieves static call dispatch (Int-To-Type idiom) template<any_string_type Value> struct scdt // = static call dispatch token { static const any_string_type value = Value; }; // if (IfStmt == true) type = T; else type = Else::type; template<bool IfStmt, any_string_type T, class Else> struct if_else_c { static const any_string_type type = T; }; template<any_string_type T, class Else> struct if_else_c<false, T, Else> { static const any_string_type type = Else::type; }; // if (IfStmt::value == true) type = T; else type = Else::type; template<class IfStmt, any_string_type T, class Else> struct if_else : if_else_c<IfStmt::value, T, Else> { }; // wraps T (needed to pass T as an Else to if_else) template<any_string_type T> struct wrap { static const any_string_type type = T; };
You'll see later what's the static call dispatch token is good for. But with the if_else template, we're able to define a mapping between the native string types and any_string_type:
template<typename T> struct scdt_map // if (T is convertible to std::string) type = u8 : if_else<std::is_convertible<T, std::string>, any_string_type::u8, // else if (T is convertible to std::u16string) type = u16 if_else<std::is_convertible<T, std::u16string>, any_string_type::u16, // else if (T is convertible to std::u32string) type = u32 if_else<std::is_convertible<T, std::u32string>, any_string_type::u32, // else type = nothing wrap<any_string_type::nothing>>>> { };
Everything that's convertible to std::string is mapped to any_string_type::u8. Same is true for std::u16string => any_string_type::u16 and std::u32string => any_string_type::u32. If T is not convertible to any string type, then T is mapped to any_string_type::nothing.
Ok, I'll show the full definition of any_string first, before I'll explain the implementation details:
class any_string { // needs access to get() template<typename T> friend T any_string_cast(any_string&); // safe some typing static const any_string_type u8_t = any_string_type::u8; static const any_string_type u16_t = any_string_type::u16; static const any_string_type u32_t = any_string_type::u32; static const any_string_type nothing_t = any_string_type::nothing; // actual type any_string_type flag; // unrestricted union union { std::string str8; std::u16string str16; std::u32string str32; }; // access to members via type token (static call dispatch) inline std::string& get(scdt<u8_t>) { return str8; } inline std::u16string& get(scdt<u16_t>) { return str16; } inline std::u32string& get(scdt<u32_t>) { return str32; } // const access to members inline const std::string& get(scdt<u8_t>) const { return str8; } inline const std::u16string& get(scdt<u16_t>) const { return str16; } inline const std::u32string& get(scdt<u32_t>) const { return str32; } // invokes f(this, scdt<T>) with T = flag (the actual runtime type) template<typename Fun> void invoke(Fun&& f); // implements get(token).~string_type(); struct destroyer { template<any_string_type Type> void operator()(any_string* self, scdt<Type> token) const; }; // implements *this = other struct setter { const any_string& other; inline setter(const any_string& other_ref) : other(other_ref) { } template<any_string_type Type> inline void operator()(any_string* self, scdt<Type> token) const; }; // implements *this == other struct comparator { const any_string& other; bool res; inline comparator(const any_string& oref) : other(oref), res(false) { } template<any_string_type Type> inline void operator()(any_string* self, scdt<Type> token); }; // implements *this = std::move(other) struct mover { any_string& other; inline mover(any_string& other_ref) : other(other_ref) { } template<any_string_type Type> void operator()(any_string* self, scdt<Type> token) const; }; // postcondition: flag == nothing_t void destroy(); // postcondition: flag == Type template<any_string_type Type, class V> void set(V&& value); public: // note: all union members are left unitialized inline any_string() : flag(nothing_t) { } template<class Str> any_string(Str&& str); any_string(any_string&& other); any_string(const any_string& other); template<class Str> any_string& operator=(Str&& str); any_string& operator=(any_string&& other); any_string& operator=(const any_string& other); bool operator==(const any_string& other); template<class Str> bool operator==(const Str& str); template<class Str> inline bool operator!=(const Str& str) { return !(*this == str); } inline ~any_string() { destroy(); } inline any_string_type type() const { return flag; } };
any_string supports all kinds of assignment, construction and comparison. But we don't need to write a switch/case for each of those member functions. The trick is done by invoke. This little helper function evaluates the runtime type of any_string and performs a static call dispatch according to the found type. In fact, invoke encapsulates the switch/case:
template<typename Fun> void any_string::invoke(Fun&& f) { switch (flag) { case u8_t: f(this, scdt<u8_t>()); break; case u16_t: f(this, scdt<u16_t>()); break; case u32_t: f(this, scdt<u32_t>()); break; default: break; } }
You might noted the nested classes in the any_string definition. Those are functors that implement operations on any_string. The first functor is called destroyer:
template<any_string_type Type> void any_string::destroyer::operator()(any_string* self, scdt<Type> token) const { typedef typename std::remove_reference<decltype(self->get(token))>::type str_type; self->get(token).~str_type(); }
invoke calls destroyer::operator() with a type token and the this pointer. get(token) returns a reference to the actual used string implementation from the union. To call the destructor, we also need correct type (str_type).
Now, the implementation of destroy() is trivial:
void any_string::destroy() { invoke(destroyer()); flag = nothing_t; // reset flag }
Since invoke() does nothing if flag == nothing_t, we don't have to fear any side effects if destroy() is called on an "empty" any_string.
The implementation of the other functors is also straightforward:
template<any_string_type Type> void any_string::setter::operator()(any_string* self, scdt<Type> token) const { self->set<Type>(other.get(token)); } template<any_string_type Type> void any_string::mover::operator()(any_string* self, scdt<Type> token) const { self->set<Type>(std::move(other.get(token))); } template<any_string_type Type> void any_string::comparator::operator()(any_string* self, scdt<Type> token) { if (other.flag == Type) { res = self->get(token) == other.get(token); } // else: res = false (default) }
The last "tricky" function is set:
template<any_string_type Type, class V> void any_string::set(V&& value) { scdt<Type> token; if (flag == Type) { // type matches, use operator=(); // calls automatically either copy or move assignment operator // ("perfect forwarding") get(token) = std::forward<V>(value); } else { // destroy old value destroy(); // get the "real" string type from Type typedef typename std::remove_reference<decltype(get(token))>::type str_type; // use placement new and call constructor "by hand"; // calls automatically either copy or move constructor // ("perfect forwarding") new (&get(token)) str_type(std::forward<V>(value)); flag = Type; } }
Thanks to the "perfect forwarding rule" of C++, one single implementation covers all possible cases (const and non-cosnt lvalue reference, movable rvalue, etc.).
The same is true for the template constructor, operator= and operator== of any_string:
template<class Str> any_string::any_string(Str&& str) : flag(nothing_t) { static_assert(scdt_map<Str>::type != any_string_type::nothing, "T is not a valid string literal"); set<scdt_map<Str>::type>(std::forward<Str>(str)); } template<class Str> any_string& any_string::operator=(Str&& str) { static_assert(scdt_map<Str>::type != any_string_type::nothing, "T is not a valid string literal"); set<scdt_map<Str>::type>(std::forward<Str>(str)); return *this; } template<class Str> bool any_string::operator==(const Str& str) { static_assert(scdt_map<Str>::type != any_string_type::nothing, "T is not a valid string literal"); if (flag == scdt_map<Str>::type) { scdt<scdt_map<Str>::type> token; return get(token) == str; } else { return false; } }
They cover any kind of string literal and lvalue/rvalue string references. The static_assertion also gives a human-readable error message if Str isn't a string at all.
The implementations of the remaining member functions are no-brainers:
any_string::any_string(any_string&& other) : flag(nothing_t) { invoke(mover(other)); } any_string::any_string(const any_string& other) : flag(nothing_t) { invoke(setter(other)); } any_string& any_string::operator=(any_string&& other) { invoke(mover(other)); return *this; } any_string& any_string::operator=(const any_string& other) { invoke(setter(other)); return *this; } bool any_string::operator==(const any_string& other) { comparator cmp(other); invoke(cmp); return cmp.res; }
The last missing part of our implementation is any_string_cast. It provides an explicit interface to access the "real" type of an any_string object. Again, before we start implementing any_string_cast, we need a few lines metaprogramming utility:
// map T to the corresponding any_string_type values template<typename T> struct any_string_cast_trait : if_else<std::is_same<T, std::string>, any_string_type::u8, if_else<std::is_same<T, std::u16string>, any_string_type::u16, if_else<std::is_same<T, std::u32string>, any_string_type::u32, wrap<any_string_type::nothing>>>> { }; // match references template<typename T> struct any_string_cast_trait<T&> : any_string_cast_trait<T> { }; // match const references template<typename T> struct any_string_cast_trait<const T&> : any_string_cast_trait<T> { };
Since any_string_cast is a friend of any_string it has access to any_string::get. The any_string_cast_trait utility class makes any_string_cast a no-brainer (if we wouldn't care about error handling, it would be a one-liner), because it only needs to pass any_string_cast_trait
template<typename T> T any_string_cast(any_string& astr) { typedef any_string_cast_trait<T> trait; static_assert(trait::type != any_string_type::nothing, "T must be one of std::string, " "std::u16string or std::u32string"); // type checking if (astr.flag != trait::type) throw std::bad_cast(); return astr.get(scdt<trait::type>()); }
That's it. Of course, the invoke trick doesn't reduce the runtime overhead of the union. But it encapsulates the switch/case statement and utilizes static call dispatching to minimize implementation overhead. Adding a new type to any_string (e.g. std::wstring or QString) would require only a few lines of code.
any_string in use:
any_string a1 = "hello world"; any_string a2 = "hello world"; assert(a1 == a2); assert(a1 == "hello world"); assert(a1 == std::string("hello world")); assert(a1 != u"hello world"); assert(a1 != U"hello world"); a2 = u"hello world"; assert(a1 != a2); cout << any_string_cast<std::string&>(a1) << endl;If you're interested in the full source code: download here (google sites link). Compile with g++ -std=c++0x any_string.cpp (GCC >= 4.6 is required).