Note: See the updates at the bottom of this article.
First, a little teaser (in the D programming language):
// testInputVisitor.d // Requires DMD compiler v2.059 or up import std.algorithm; import std.range; import std.stdio; import libInputVisitor; struct Foo { string[] data = ["hello", "world"]; void visit(InputVisitor!(Foo, string) v) { v.yield("a"); v.yield("b"); foreach(str; data) v.yield(str); } void visit(InputVisitor!(Foo, int) v) { v.yield(1); v.yield(2); v.yield(3); } } void main() { Foo foo; // Prints: a, b, hello, world foreach(item; foo.inputVisitor!string) writeln(item); // Prints: 1, 2, 3 foreach(item; foo.inputVisitor!int) writeln(item); // It's a range! Prints: 10, 30 auto myRange = foo.inputVisitor!int; foreach(item; myRange.filter!( x => x!=2 )().map!( x => x*10 )()) writeln(item); }
Rewinding back to the start, here's our original problem: We want to iterate over a collection.
Ever since the D1 days, D has had foreach's parter, opApply, as the standard way to do this:
// testOpApply.d import std.stdio; struct Foo { string[] data = ["hello", "world"]; int opApply(int delegate(string) dg) { int result=0; result = dg("a"); if(result) return result; result = dg("b"); if(result) return result; foreach(str; data) { result = dg(str); if(result) break; } return result; } } void main() { Foo foo; // Prints: a, b, hello, world foreach(item; foo) writeln(item); }
The logic within opApply is mostly natural and straightforward, but it is complicated a bit by dealing with that result variable.
Later on, D2 came along and introduced ranges - D's clean, powerful answer to C++/STL's iterators. The concept of ranges and their motivations and benefits are well explained in Andrei Alexandrescu's classic article On Iteration (and further information is in this article), so I won't go into that.
Here's an input range version of the above examples:
// testInputRange.d import std.stdio; struct Foo { string[] data = ["hello", "world"]; @property auto inputRange() { struct FooRange { private Foo foo; private size_t index=0; @property bool empty() { return index >= foo.data.length + 2; } @property string front() { if(index == 0) return "a"; if(index == 1) return "b"; return foo.data[index-2]; } void popFront() { index++; } } return FooRange(this); } } void main() { Foo foo; // Prints: a, b, hello, world foreach(item; foo.inputRange) writeln(item); }
Unfortunately, that's more complex than the old opApply style, but it provides some very good benefits, particularly when used together with Phobos's std.algorithm and std.range.
But if all you need is basic foreach-style input range capabilities, that can be a lot of boilerplate. Plus, it doesn't lend itself to complex internal logic quite as naturally as opApply does. And opApply itself could stand to be easier to write.
If you've looked carefully, you may have noticed that opApply is essentially a poor man's coroutine. That's why the flow of logic is more natural in opApply than in the range version. For those unfamiliar with coroutines, see this old-but-excellent set of slides and the accompanying video. Here's the upshot: Coroutines are like a combination of traditional iterators and the visitor pattern - combining the best of both worlds.
D offers coroutines via Fiber. Looking at the docs, one of the two main ways to use Fiber is by subclassing it. Hmm...a class...easy coroutine iteration...What if we designed a Fiber subclass that was also a range?
As it turns out, fiber coroutines and input ranges make a fantastic combination that combines the power and flexibility of an input range with opApply's natural flow of logic, but without the boilerplate of opApply's messy result variable. You've already seen in the first example how easy it is to use this InputVisitor, as I call it. The source is very short, and available in the file libInputVisitor.d.
There is one small enhancement we can make to the original example, though. Compared to the opApply version, the foreach when using InputVisitor is a little more verbose:
foreach(item; foo.inputVisitor!string)
We can trim that down by adding one member to the Foo struct:
... struct Foo { ... @property auto visitor() { return this.inputVisitor!string; } ... } void main() { Foo foo; // Prints: a, b, hello, world foreach(item; foo.visitor) writeln(item); ... }
Another member can also be added to handle the int version.
The downside of this is that it doesn't lend itself very well - if even at all - to any ranges more powerful than the basic input range. Execution of a function doesn't flow in reverse, so bidirectional ranges are out. And certainly no random access. Forward range might be possible in theory, but it would depend on being able to clone a halted fiber - I don't know whether that's possible.
Beware, I haven't done any benchmarks on this, so I have no idea how much of an overhead this InputVisitor might have over the opApply or input range versions. Could be a lot, could be minimal - I don't know.
UPDATE (2012-05-01): Changed libInputVisitor license to WTFPL. For something this trivial, even the super-permissive zlib/libpng license is overkill.
UPDATE (2012-05-10): This D newsgroup thread revealed that the InputVisitor approach does in fact have a non-trivial performance overhead. Oh well :(
But Adam Ruppe pointed out a nice trick for improving the syntax of the opApply version. His approach, used like mixin(yield("someVar"));, isn't quite as nice looking as v.yield(someVar); and doesn't result in an actual range (you can only foreach over it). But when you just need a basic generator, I think it provides the best balance between syntax and speed.