Chewing on iterators

September 28, 2008 21:53

One thing that is really giving me goose bumps about C#3 is its swag of auto generated code. Don’t get me wrong, the readability we get as a result is pretty much awesome. But auto generated code is an abstraction, and every abstraction leaks, as we were told[1].

Iterators are a good example of a complicated code generation. Without getting into gory details, a method with a yield operator gives birth to an auto generated class that is actually a state machine with logic of MoveNext and Current. Not only does it propel a more concise code, but also makes a certain impression upon your colleagues and family members.

Iterators have been introduced to help you with complicated data structures. Say, binary trees, where yield return works wonders:

private IEnumerable<T> Traverse(Node<T> node)
{
  if (node != null) {
    //yield return node.Item; /*uncomment to get pre-order traversal*/
    foreach (T item in Traverse(node.Left)) { yield return item; }
    //yield return node.Item; /*uncomment to get in-order traversal*/
    foreach (T item in Traverse(node.Right)) { yield return item; }
    //yield return node.Item; /*uncomment to get post-order traversal*/
  }
}

Though it’s worth mentioning that if the tree is big enough, recursion might become a bottleneck. We spend O(h) time to traverse a node that is h levels deep, so with n nodes in the tree the overall complexity is O(n*h). And well, as troubles never come singly, so recursion brings up another problem: garbage collection.

So, here’s my Hans-Christian-Andersen-inspired fairy tale. Handsome and brief «yield return» turns to an ugly-named auto generated class, whose instances live on the heap. Every time someone calls on Traverse from the example above, a tiny innocent 12-bytes-weight tithe of an iterator gets created just to be killed by a wicked stepmother GC. But, dear kids, there’s always next time: with recursive approach the iterator is considered to be alive and can’t be reclaimed until the recursion gets back.

For the grown-ups out there: as we are interested in reclaiming the memory as soon as possible, is there any way for doing that with iterators? Short answer is – yes, go for value types[2].

 

If you look at List<T> and Dictionary<K,V> you’d notice that their iterators are implemented as public structs. It could sound a bit strange – since they implement IEnumerator so will be boxed when get casted to an interface, right?  Well, not really. It’s correct if some code does the casting, but if you use those struct iterators in a foreach loop, the C# compiler wouldn’t cast. Instead, it uses so called duck typing: any type that implements Current and MoveNext will match.
 
I’m not going to rant whether value type iterators are better or not. Putting less pressure on GC is always a good thing, but typically apps waste their time in db calls and cross-tier interactions rather than in tight loops. To my regret, tight loops and traversals of binary trees is something not so common in software development. So you’d probably better off optimizing your app on a different level.

But there is another …um, feature with value type and reference type iterators. Most people, like myself, writing vernacular C# with iterators in foreach loops only, wouldn’t probably even think about such a scenario. It has a simple explanation and the code below can be easily fixed (boxing, huh?), but still worth mentioning: if used second time, a «yield return based» enumerator will start from the last reached point but a «generic list based» one will start from the very beginning:

static IEnumerable<int> MyYield
{
    get
    {
        yield return 1;yield return 2;yield return 3;yield return 4;yield return 5;
    }
}
 
static List<int> MyList
{
    get { return new List<int> { 1, 2, 3, 4, 5 }; }
}
 
public static void Main(string[] args)
{
    Console.WriteLine("Double iterating with a list enumerator");
    var e1 = MyList.GetEnumerator();
    DoSomething(e1);
    DoSomething(e1);
 
    Console.WriteLine("Double iterating with a yield enumerator");
    var e2 = MyYield.GetEnumerator();
    DoSomething(e2);
    DoSomething(e2);
}
 
static void DoSomething(IEnumerator<int> e)
{
    int i = 0;
    while (e.MoveNext())
    {
        Console.WriteLine(e.Current);
        //just stop at some point, don't go through all items
        if (++i > 1)
            break;
    }
}

 


Footnotes:
[1] – The Law of Leaky Abstractions (first time I mistyped «Law» as «Jaw» that probably doesn’t spoil the title).
[2] – In fact, value type iterators wouldn’t help much with recursion. They decrease the pressure on GC but bring up another issue: stack overflow. Local value types live on the stack, so StackOverflowException is something you can face earlier with struct iterators. Good news is that every recursive algorithm can be rewritten as a non-recursive one (nobody said it will be easy though!). Instead of the «implicit» call stack you’d need to use an explicit stack with push and pop operations. But as a general rule, try to avoid recursion especially if you cannot say how deep it could go.


 

 


Comments are closed