WebCrawler

The time is November, 1993. NCSA Mosaic, the first graphical web browser, is released to an unsuspecting world. The World Wide Web is still very much in its infancy, and the only real guide to what's out there is the five-month-old NCSA What's New archive.

Brian Pinkerton, a CSE student at the University of Washington, was one of the early fans. In January 1994, he began developing a search engine for the WWW in his spare time. He called it WebCrawler, and it existed only as a desktop application as he developed it. In April he released it to the world as an Internet-based search engine, and by the end of the year it had served up over one million queries from Brian's home at the University's public Web site. Not bad, considering that NCSA Mosaic hadn't even had time to beta test version 2.0 yet.

WebCrawler was the first spidering search tool for the Web. Unlike the manually-organized indexes at NCSA What's New and Yahoo!, it employed several software robots that "crawled" the Web using a breadth-first search algorithm, following hyperlinks from one server to another. Its unique power lay in the fact that when crawling the hyperlinks from a given page, pages on other servers were given precedence, ensuring that at least one page was indexed on as many servers as possible.

Pages were then indexed in a database as they were found. When WebCrawler received a query, it would pull up the matches in its database, sort them by relevance (based on where the query words were in the HTML and how often they occurred), and then fire off the robots on those results. Any pages linked to from the initial query results were assumed to be relevant and were immediately ranked and returned with the query. This continued until either no new pages were found, or a certain time limit was reached. The result was a search engine with extraordinary breadth that was, for reasonably specific queries, far more effective than a directory.

In December 1994, WebCrawler acquired two sponsors to help keep it up and running. It was soon purchased by AOL in June 1995, when that company hadn't yet made the World Wide Web part of its offerings.

WebCrawler's prestige began to fade in December 1995. That was when AltaVista went live, beginning with public access to a full index of the Web, some sixteen million pages. WebCrawler's technique of crawling for new results at the time of the query may have been more current, but AltaVista was indexing constantly and WebCrawler couldn't match the speed of its database. (AltaVista remained unchallenged as the king of search engines until the arrival of Google in late 1998.)

In 1997 WebCrawler was sold by AOL to Excite, and again to Infospace in 2001 when Excite went bankrupt. Infospace keeps WebCrawler online at http://www.webcrawler.com for those who like to use it.

Primary factual sources:

Brian Pinkerton's home page
http://www.thinkpink.com/bp/bio.html

"Search Engine Players and Brief History"
http://www.searchengineworld.com/engine/players.htm

NCSA What's New	Altavista	Everything2 Civil War	web crawler
Google	Open Directory Project	breadth-first search	excite
Computer Engineering is not the same as Computer Science	search engine	Mosaic	Software is free speech
scutter	index	Internet Beard Research and the Eternal September	United States Military Bugle Calls
George Washington's 1795 State of the Union Address	George Washington's 1791 State of the Union Address	It's Hard to be Humble	Lightwater Valley
Yahoo!	Infospace	Higher New York	big

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups