Simon Ask Consulting

C++ can have lazy lambda enumerators too!

Expressive languages such as Ruby and Python have made it painful for many of us to type out a full for-loop in C and C++. With lambda expressions in C++0x and a slightly horrible amount of template code, we no longer have to!

The Goal

Note! C++0x is very new and not even finished yet, so you need to make sure to use a compiler that doesn't care about that, such as GCC 4.5 or above.

In Ruby, you can loop over the elements in the array foo, and print them out on the screen, like this:

foo.each { |x| puts(x) }

This calls a method each on the object foo with a code block (i.e. lambda function) as the only argument. This is how Ruby works.

Wouldn't it be neat if we could do the same in C++? Here's our goal:

Array<int> foo;

// ... add some elements to foo

foo.each([](int x) { printf("%d\n", x); });

Furthermore, chains of transformations to an array or list can be done like this:

foo.select([](int x) { return x > 2; })
   .map([](int x) { return x + 100; })
   .each([](int x) { printf("%d\n", x); });

If you are familiar with Ruby, the above should be immediately readable to you. If not, it's simple: First we select the elements of foo for which x > 2 is true, then we add 100 to each of them, and print all the results. It doesn't have to get much more complicated than this before a simple for-loop becomes a horrible mess to maintain.

With C++0x and lambda expressions, we can do this! I will explain how, but first let's introduce the concept of lazy enumerators.

In Programming, Lazy means … Something, can't be bothered to type it out.

An enumerator is an object representing one step of an iteration, much like an iterator, but unlike iterators, enumerators have knowledge of when to stop. This makes them easier to use in some people's minds, which is why they are very popular among C# and Java folks.

An enumerator for an Array-like data type (e.g. std::vector, but really any container type that stores its elements in one contiguous chunk of memory) might look like this:

template <typename T>
class ArrayEnumerator {
public:
    ArrayEnumerator(T* current, T* end) : _current(current-1), _end(end) {}
    // ... other simple constructors
    bool move_next() { ++_current; return !at_end(); }
    bool at_end() const { return _current >= _end; }
    T& current() { return *_current; }
private:
    T* _current;
    T* _end;
};

Note that the initial state of the enumerator is before the first element. This is different from iterators, and comes from the way we normally use enumerators, as you can see below.

With this class, we can iterate over the elements of an array like this (C++0x syntax):

auto e = foo.get_enumerator();
while (e.move_next()) {
    auto x = e.current();
    // do something with x
}

Note that I use the auto keyword of C++0x to avoid having to type out the full templated typenames.

More Than Iteration

We can use the concept above to create enumerators that do more than simply walk through a chunk of memory. What if we created an enumerator that took another enumerator as its input, and perhaps a functor-like object (such as a lambda) to do something with the input?

Here's an example of an enumerator that can be used with lambda expressions in C++0x to loop through only the elements of a collection that match the criteria given by the functor:

template <typename SourceEnumerator, typename Functor>
class SelectEnumerator {
public:
    typedef typename SourceEnumerator::ValueType ValueType;

    SelectEnumerator(const SourceEnumerator& e, Functor func) :
        _enumerator(e), _func(func) {}
    // ... other simple constructors

    bool move_next() {
        // call move_next on source until it stops,
        // but skip all elements for which _func(x) is false.
        do {
            if (!_enumerator.move_next())
                return false;
        } while (!_func(current()));
        return true;
    }

    ValueType current() { return _enumerator.current(); }
    bool at_end() const { return _enumerator.at_end(); }
private:
    Functor _func;
    SourceEnumerator _enumerator;
};

The above could be used somewhat like this:

auto e = SelectEnumerator(foo.get_enumerator(),
     [](T x) { return x > something; });
while (e.move_next()) {
    // do something with e.current().
}

Note! The above would not compile, because C++ disallows template type inference in constructors. Blame Bjarne. The workaround is to create a function returning the type-inferred type, as we will do in the following.

Being Lazy

The beauty of the SelectEnumerator above is that no call to the functor is being made before it is actually needed. Everything is calculated in-place, and there is no need to allocate space for a new array containing the whole set of the result. This is what it means to be "lazy".

Being lazy also makes it possible to do neat things that are normally found on the list of cool stuff you can do with functional languages, such as creating an infinite list of integers, and then applying a SelectEnumerator to recognize the prime numbers among them, thus giving you an infinite list of primes (or rather, a prime generator that looks like a list. Now I made it sound lame, but it totally isn't).

More advanced enumerators can also be lazy, such as the MapEnumerator, which takes each element and applies a tranformation defined by a functor, on-demand and with zero extra space requirements.

MapEnumerator is slightly more complicated, since it can potentially change the type of the elements, depending on the functor.

template <typename SourceEnumerator,
    typename TargetType,
    typename Functor>
class MapEnumerator {
public:
    // ... simple constructors here

    TargetType current() const { return _func(_enumerator.current()); }

    // move_next(), at_end() simply call their equivalents on _enumerator
private:
    Functor _func;
    SourceEnumerator _enumerator;
};

Mixin' it all in

If you, like me, are not a big fan of the STL, you can mix these methods into your own collection classes. I did this for my own FTL, to provide convenient methods for Array, List, Map, Set, etc.

But writing a set of wrapper functions for each type of collection class is tedious. And this is where we can make use of the horror that is the C++ Template System, and a bit of hackery, to make a base class for all of our collection classes, which implements methods like each, select, reject, inject, count, any, all, and more. Conveniently, we can also use this as a base class for all the enumerator classes too, enabling those sexy chains of transformations promised in the first paragraph.

But this is where it gets slightly hairy. The following is a condensed version of the base class Enumerable.

template <typename TargetClass,
    typename Enumerator = TargetClass>
class Enumerable {
public:
    SelectEnumerator<...> select(Functor func) const;
    RejectEnumerator<...> reject(Functor func) const;
    MapEnumerator<...> map(Functor func) const;
    void each(Functor func) const;

    template <typename TargetType>
    TargetType inject(TargetType base, Functor func) const;

    // ... and many more
private:

    // Dirty hack to access the subclass's base enumerator.
    // This could also be done as a virtual method, but that
    // would add a vtable to all container classes, which would
    // be bad. The downside is that it probably breaks multiple
    // inheritance, but what doesn't these days.

    Enumerator get_enumerator() const {
        return static_cast<const TargetClass*>(this)->get_enumerator();
    }
};

The idea here is that each method (select, reject, etc.) takes a Functor (which is a template type, omitted for brevity) and returns the correct enumerator object. Each new enumerator uses the enumerator of TargetType as its input.

The Full Code

I strongly recommend you look at the fully working implementation, since much was left out for brevity here.