iterator - Everything2.com

by Xamot

Sat Nov 13 1999 at 14:43:42

Something used in programming. A variable to keep track of which iteration you are on when continously repeating a loop. The iterator is often used to know when to exit the loop, but not always.

An index to keep track of which element in an array, that is being processed, is an example. Another would be a pointer in to a data structure that gets moved on each pass through the loop to process the whole data structure or just a range of it.

I like it!

(idea)

by wharfinger

Mon Jan 28 2002 at 22:31:29

The word "iterator" means different things in different contexts and programming languages, but it's always got something to do with visiting each one of a set of objects, where "object" doesn't necessarily mean "class instance": Just take "object" to mean "some instance of some data type, somewhere".

In C++, you usually hear the term used with regard to the container classes belonging to the Standard Template Library (commonly called "the STL").

In STL terms, an "iterator" has "pointer semantics": It looks and smells a lot like a pointer. Sometimes it is a pointer, when that's appropriate, but other times it isn't (as for the ones that are classes, some implementations will inherit some or all of them from a common base class, and others won't; this template stuff is generic programming, not OOP, and inheritance is sort of moot). The purpose is to provide an abstraction of pointer behavior. The container you've got may be an associative array (std::map), a one-dimensional dynamic array (std::vector), or a linked list (std::list). Whatever it is, the library guarantees that you can get an iterator to the first element in the list by calling the container's begin() function, and it guarantees that you can call its end() function to get a "NULL" value: If you've got an iterator belonging to a given instance foo of an STL container, all invalid iterators belonging to foo will be equal to foo.end().

In theory, STL iterators do what you'd expect a pointer to do, but in practice they tend to have limits, which we'll discuss later. Their interface uses pointer operators: You dereference an iterator to get the thing it points to; you increment it with the ++ operator (pre or post), and so on.

The upshot of all this is that you can write generic algorithm code that neither knows nor cares what kind of container it operates on, because it interacts with the container through this "iterator" abstraction. You can write the following code to traverse a series of elements contained in any given STL container class (and when you stop to think about it, this will work just fine on old-fashioned int foo[], too):

//  If you don't understand C++ templates reasonably well, 
//  this may be somewhat perplexing.
template< class iter_T >
void dump( iter_T from, iter_T to )
{
    for ( /**/; from != to; ++from )
        cout << *from;  //  ...or whatever floats your boat, really.
}

And here's how you'd use it:

    std::vector< int >  foo;
    std::list< double > bar;

    //  ...put something in the containers...

    //  Spray it at the user:
    dump( foo.begin(), foo.end() );
    dump( bar.begin(), bar.end() );

The std::vector iterators are almost certainly pointers in your implementation. Other iterators aren't; for a linked list class template, you might have something like this:

    //  T is the contained type, element_T is the list-element type.
    //  This is right off the top of my head.
    template < class T, class element_T >
    class iterator {
    public:
        ...
        //  preincrement
        iterator & operator++() {
            //  No sanity checking. That's the user's problem.
            _el = _el->next;
            return *this;
        }

        T & operator*() {
            return _el->data;
        }
        ...

    private:
        element_T * _el;
    }

Some iterators are "bidirectional"; some aren't. If you've got a singly-linked list, you can't go home again, so to speak: It's convenient to walk the list in the forward direction, but walking a singly-linked list backwards would be monstrously inefficient. To get from the fortieth element to the thirty-ninth, you'd have to start over at the first element and walk all the way back up to number thirty-nine. That's loony. You can do it by hand if you like, but the STL won't help you do crazy things like that. You're supposed to be able to assume that any supported operation is reasonably efficient.

Some iterators allow random access, and some don't. Naturally, an iterator belonging to a linked list or a tree won't do that: Again, that operation is too inefficient to take seriously. On the other hand, std::vector's iterators do provide random access:

    //  Stupid, but legal.
    std::vector< int >    foo( 100 );

    int n = *(foo.begin() + 23);    //  Pointer syntax, remember?

That's because it's easy and cheap to figure an offset in an array, which is what std::vector really is. You can see we're not talking about real theoretically pure, industrial strength abstraction here. In C++, reality is always acknowledged. LISP aggressively disregards reality, but that's a different language.

Some other languages have things they call iterators, too. One that springs to mind is Ruby, an object oriented scripting language native to the romantic faraway island of Japan, which K&R C programmers still call "Cipangu".

Ruby's "iterators" don't look much like their namesakes in C++. They're a little spoonful of syntactic sugar involving closures. They look like this:

    #   Print 0 through 2
    #   times() is a member function of Ruby's number class (FixNum, 
    #   for smaller numbers). In Ruby, number and string literals are 
    #   class instances.
    3.times { |x| puts( x ); }

    #   step() is another one: Count from this up to A in steps of 
    #   size B. This will print 0, 3, 6, 9, and 12
    0.step( 12, 3 ) { |x| puts( x ); }

    #   IO is the stream abstraction class. Its static member function
    #   foreach() reads a file as lines of text, and yields up each 
    #   line in turn to the block of code that you give it.
    i = 0;
    IO.foreach( "inputfile" ) { |line|  #   weird way to pass args, huh?
        print( i, ": ", line );
        #   Infuriatingly, Ruby doesn't support ++ properly.
        i = i + 1;
    }

They're handy, but they're really just a convenient notation for calling a callback. You could write essentially the same code in any language that supports closures. Take for example the following implementation of the number class's times() iterator:

    #   Ruby
    def FixNum.times()
        i = 0;

        while ( i < self )
            #   yield is used in the sense of "render up", not 
            #   "defer to". A yield statement calls the block of 
            #   code that the user provides, and passes its 
            #   argument(s) to the block.
            yield i;
        end
    end

    //  The same in JavaScript (we're adding a member 
    //  function; JavaScript objects if in need of greater 
    //  clarity)
    Number.prototype.times = function( callback ) {
        for ( var i = 0; i < this; ++i )
            callback( i );
    }

    //  Use it like so (JS doesn't like calling member functions
    //  on integer literals, though it's happy to do the same on 
    //  a string):
    var foo = "urk: ";
    Number( 3 ).times( function( n ) { print( foo + n ); } );

So, okay, big whoop. There's nothing special or magical about Ruby iterators, except for the fact that the designer of the language loves the damn things to distraction. They're ubiquitous in the built-in classes, and in practice that makes the language very pleasant to work with. It's a bit like the ubiquity of the C idioms favored in K&R. It's not really inherently part of the language, it's just the way everybody who uses the language learns to use it. Good manners dictate that I keep my mouth shut about Larry Wall's influence on the Perl community.

I like it!

2 C!s

(idea)

by haggai

Fri Nov 15 2002 at 14:19:19

In (new versions of) Python, iterator syntax is really just a very simple protocol. It's surprisingly simple, but I've recently found out how flexible that makes it. New iterators replace the lame, patchy, previous implementation, which had you use the random access indexing method __getitem__ (properly intended for the aΓ] syntax) to dupe the for loop, without ever really supporting random access.

An iterable object (which should really be called a preiterator, IMHO) is one which supports the __iter__ method. All this does is return an iterator. This is important. In one typical scenario, the iterable object is some sort of container, and the iterator (probably created transiently for the purposes of a for loop) just points into it. In another (e.g. xrange) the preiterator stores the parameters of some simple enumeration (say the start, stop, and skip of an arithmetic sequence), and the iterator implements some algorithm to traverse the specified values. This is sometimes a great improvement on generating the whole list...

An iterator, now, is just an object with methods __iter__ and next. On __iter__, the object is supposed to return itself. Although strange, this sometimes makes sense. This makes an unstepped iterator somewhat like an iterable object. More on this below. Besides, anything uniterable is un__iter__able (not to be confused with inutterable). The next method does all the work (such as it is): it returns the next value (so, really, it returns a value and steps). On a list, say, this could work by returning the value pointed to by some internal pointer, and then incrementing it. If next is called past the terminating element, the iterator raises a StopIteration exception.

You can use for to loop over any iterator, as it just does __iter__, and then nexts the iterator until it catches a StopIteration. So it's just
for x in MyIterableObject: whatever(x). Similarly, if you want to handle iterators yourself (you'd normally just hold on to preiterators, but maybe you want to fiddle with your own for-like loop, or write stuff to perform operations like concating and multiplying (pre)iterators) you just try nexting your iterators whenever they're needed, and when they raise StopIteration, you move on (for example, to the next iterator, if you're concatenating).

Want to restart an iterator? Just take the __iter__ from the preiterator which gave you it...

If you've somehow lost the original preiterator (probaby because you're a nitwit and didn't heed advice to iter your iterator only at the very moment you need to start iterating) you can still substitute your unstepped iterator, if you need to pass something to a function which starts by taking the __iter__ result...

One clever way to create iterators is using python's new generators. To my mind, these are really only useful if you're using an iteration from some external (e.g. C++) function or class wrapped in python, or if you wrote some complicated iteration loop in a function before iterators came along, and you want to start using an iterator.

I like it!

1 C!

C++: no compilers available	for loop	Why C++ rules	Cipangu
C++: brackets to make an object not a function	Storing a doubly-linked list using just a single pointer field	list comprehension	Closure
One equals point nine repeating	Iter	Generator	lambda
Python	C++ input iterator	bounds checking	Pay attention, this will be important later
Babysitting a gigantic five year old	generic programming	refactoring	method
Pointy sub	deque	Learning to program	"i" is a meaningful variable name