Wednesday, January 18, 2012

Opt for option!

In a perfect world, each function could safely assume all arguments are valid. But this is the real world and things go wrong all the time. Especially whenever a function needs to parse a user-defined input. A good example for such a function is X to_int(string const& str). What is the correct type of X?

Well, some might argue "int of course and the function should throw an exception on error!". I would not recommend it. Throwing an exception is really the very last thing you should do. An exception is a gigantic hammer that smashes your stack to smithereens. If your user didn't read the documentation and uses your function without a try block, the exception will kill the whole program. Furthermore, exceptions are slow. All major compilers are optimized to have few overhead for entering a try block. Throwing an exception, stack unwinding and catching are expensive. You should not throw an exception unless there is nothing else you can do.

Use a bool pointer. Some libraries, e.g., the Qt library, use a bool pointer as function argument. Honestly, I don't like this approach. It forces you do declare additional variables and always returns an object, even if the function shouldn't. It's ok for integers, but what if your function returns a vector or string? Creating empty objects is a waste of time.

Return a pair. Some STL functions return a pair with a boolean and the result. The boolean indicates whether the function was successful. Again, creating empty objects is a waste of time. Thus, this approach isn't very efficient. But that's not the real issue here:

auto x = to_int("try again");
if (x.first) do_something(x.second);
Is this code correct? The answer is "I don't know". Remember, you can assign a bool to an int and you can use integers in if-statements. You have to read the documentation of to_int to see if x.first is the bool or the int.

Return a pointer. Safety issues aside, you should use the stack for doing work and the heap for dynamically growing containers. Your stack is your friend. It is fast, automatically destroys variables as they go out of scope, and did I mention fast? Allocating small objects on the heap has a significant performance impact.

Return an option. If you're familiar with Haskell or Scala, you'll know Maybe or Option. In short, a function doesn't return a value. It returns maybe a value. If your string actually is an integer, the function returns an integer. Otherwise, it returns nothing. libcppa does have an option class. I guess you'll know what this code is supposed to do:
auto x = to_int("try again");
if (x) do_something(*x);
So, what is the type of x here? It is "option<int>". You can write if (x.valid()) do_something(x.get()); instead if you prefer a more verbose style. Option supports default values: do_something(x.get_or_else(0));. If you're a user of the boost library, maybe you'll know boost::optional. In general, I would recommend boost to everyone, but boost::optional is slow. Just have a look at the implementation. cppa::option uses a union to store the value. If the option is empty, the object in the union won't get constructed. You'll have a slight overhead of returning "empty memory" but you don't pay for creating empty objects.

This are some performance results for cppa::option compared to returning an int, returning a pair, and returning a boost::optional.
return type 100,000,000 values 100,000,000 empties
int 0.488s -
std::pair<bool, int> 2.418s 2.304s
cppa::option<int> 2.776s 1.598s
boost::optional<int> 7.419s 2.987s
The boost implementation clearly falls short. This is because the boost implementation doesn't use the stack. That's not because the boost developers don't know how to write efficient code. It's because unrestricted unions are a C++11 feature. cppa::option has a slight overhead compared to a pair for returning values but is even faster than a pair for returning empties, because the memory is uninitialized in this case. If you're already a user of libcppa: use option. If you're not (yet) a user: copy & paste the source code and use it. :)
Of course, it's C++11 only. Honestly, I really don't know why this isn't part of the STL. It should! It's general, fast, safe and really improves the readability of your source code.

No comments:

Post a Comment