MSN Search Beta
or
The Microsoft Unavailability Server0
Today,
Microsoft announced the
Beta for its new MSN Search service. This time it's their own
crawler, their own
search engine, running on their own
sheet metal. No more pass-thru to
Google, no more outsourcing to
Yahoo!. They brag of 4.5 billion pages in their index, plus a Google-like cache,
natural language query capability, a new "search near me" technology, and free access to some of the subscriber-only MSN
Encarta content.
Google has already surged their usual 4+ billion page index to a staggering 8 billion pages, according to the Wall Street Journal.
The address for the Beta is http://beta.search.msn.com/, but I wouldn't bother going there just now.
You see, the search form comes up fine, (and is as uncluttered as Google's), but no matter what you search for, as of the time of this posting, you are likely to get:
"This site is temporarily unavailable, please check back soon."
Heh. Anybody who touched the crappy and hopeless
MS Site Server, as I did, can't help but feel a bit vindicated.
1
Here is part of the letter I wrote to the Journal, I really, really hope they'll print it:
At first blush, this is understandable and even forgivable, given the timing of Microsoft's announcement, and the traffic generated by articles like yours. After all it's only a Beta, right?
Wrong.
A critical factor in the popularity of Google and other engines, besides the nifty features, is that they are always there. Beta or not, Microsoft has to show it can deliver the same smooth operation and 24/7 reliability. My guess is, they have plenty of bandwidth and computing power deployed for this Beta.
I suspect a glitch in operations, or in the algorithms someplace, both areas where Google excels.2 Exhibit 1: Not only was the service unavailable, each time it took over 15 seconds to tell me so. That's an awful lot of wasted computing power on their part, just to tell me to come back later! It may be a while before MSN Search again experience enough massive, real-world traffic surges to know if they've fixed this problem. Meanwhile, they've lost the chance to make at least a million first impressions.
It's not a good omen. If the MSN Search Beta can't handle the strain of its own publicity, how will they convince their customers that the production system will have enough robustness to handle the next Shakira hit, not to mention the next major, international breaking news event?
Microsoft spokesmonkeys haven't yet acknowledged or responded to this problem, as far as I know. Undoubtedly, they will thaw and clone the ole "
marathon, not a sprint"
meme. After all, Microsoft's
videogames and hand-held-
phone software are still both unprofitable. Heck, I remember back in about 1995 or so when MSN was announced as the eventual
replacement for the Internet. Apparently, MSN finally made its first full-year profit in the year ended June 30, 2004.
That's one year of profit out of eight, then back to using red ink with the new MSN search Beta.
So, Google, Ask, and Y! will likely stay the top of the heap for a while. But, lately I've been unable to stop playing with the intriguing Vivísimo clustering technology at http://vivisimo.com/ . For finding that hot, opensource theme, plugin, or JavaScript, in the middle of a development frenzy, the Vivísimo Open Source Cluster, http://vivisimo.com/projects/opensource, is now my first stop.
For hard-core lookup, there's the under-used, delightfully Lo-Tek Langenberg meta-search launch point at http://www.langenberg.com/. It's got just about all the phone books, dictionaries, translators, thesaurum, shipping companies, and mapping sites, all on one page, plus hidden gems like the Xerox and Fuzzums language identifier/guessers, and the precious WestEgg cliche finder!
And, don't forget the once great, former patron of yours truly, http://altavista.com/, the first major engine that indexed all common words, so that you could search for "To Be Or Not To Be".3
I imagine some at MSN Search are asking that right now...
Sources, Footnotes, and an Endnote:
"Microsoft, Late to Search Party, Seeks to Capture Google's Turf" by ROBERT A. GUTH and KEVIN J. DELANEY, The Wall Street Journal, Nov. 11, 2004, Page One.
"Microsoft Web Searcher Isn't as Good as Google, But in Time It Could Be" by WALTER S. MOSSBERG, The Wall Street Journal, Nov. 11, 2004, Page B1.
0. OK, I'm kidding a bit with the sub-title, but since when has it been bad form on E2 to have fun at Microsoft's expense? Sheesh. Yes, I know, Betas are supposed to fail; but I can't help noticing that historically, most of Microsoft's betas are called "Version 3". Suddenly, just because they're trying to make an original play in the search engine space, it's an exciting new venture that should be forgiven its parent company's past history, and a rough Beta? As if! They're a Google copy cat, and if they want to play in this space, it's hardball time. Bring It On. Hours after posting this writeup, we're long past the West Coast peak, and only just starting to see the Europe peak; yet the site is still mostly "service unavailable" after a long, expensive timeout...
Sure, I might have a cookie remaining that directs me to the same server over at MSN Search. I could clear said cookie, but I refuse to, on the principle that using cookies for load-balancing is a total kludge, even in a Beta. If they do that, they deserve far more grief than I have time to give them.
1. Sure, by the time posterity reads this, the Beta will be fixed, and they won't get the "unavailable" message. But I felt it necessary to document this for posterity. You see, when the world's largest software company launches a major, public Beta, with publicity, in a very competitive arena, search engines, it's not just a Beta. Or put another way, it's a Beta that ought to have higher expectations than what you'd expect from a skilled hobbyist, a basement entrepreneur, or an unknown startup company.
You expect the obvious things, like search results, to work, and to scale!
Particularly galling is the long delay before the "unavailable" message. Any world class data center operation worthy of the name would have a way to predict when search results servers are becoming overloaded, and proactively redirect traffic to a "service unavailable" message; the last thing you want to do is tie up resources on already stressed servers processing a query that is pretty much guaranteed to time-out anyhow. Anyone can design a search engine that is "in theory" competitive with, or even better than, Google's. I expect several outfits have already done so. So what? MSN Search needs to show the world, and investors, the sort of raw engineering know-how to make it all work In Real Life.
In my opinion and experience (which goes well beyond Site Server, see my full disclosure below), Microsoft server products, other than basic file sharing or basic web page serving, have a track record of failing to scale well on the level of large enterprise services or server farms. Does anyone use, or even remember, their load balancing offering? I thought not. When I worked for large scale sites, we used to joke that MS server products scale just fine, as long as you have plenty of sheet metal. By this we meant CPU's; the joke made more sense before rackmounted pizza box Wintel systems were widely available; or maybe the joke still works, since AFAIK, *nix is still today the most popular OS for pizza boxes!)
Implied was that you had to have an architecture that minimizes the need for access to a single data repository, which IRL is nearly impossible. For dynamically created web pages, the DB (or more accurately, the data repository) is always the main gating factor, for both queries and database updates. In my experience, Microsoft middleware components and SQL Servers, etc. produced nothing like the performance/flexibility of e.g. FastCGI and MySQL working in a well-configured cluster.
So, when a Beta such as MSN's new Search falls on its ass, it's an historical event. What's more, it's a transient historical event, one that Microsoft may well deny or downplay. It can only be attested to by those who experienced it first hand. That *really is* noding for the ages, IMO. If I'm wrong, tell me why.
2. Google will only admit to having "over 10,000" computers involved in its search service. Analysts think the true number is much higher.
Full disclosure, my hands-on experience is with clusters of 20 MS servers, and my anedotes are from time served at various data centers with "merely" 100's of MS servers, and generally about twice as many *nix boxen, where I met plenty of people who knew this stuff cold and were happy to talk.
Anyways...When Google changed its policies recently, we've already seen how good Google engineers are at major, near-zero down-time, re-structuring of their index or revision of their search algorithms; once the policies were understood, most of the complaints I heard from knowledgeable Google customers were objections to the new policies, rather than complaints that the service was malfunctioning. It looks like MSN Search has a ways to go before they can promise their investors similar agility.
I have little doubt MSN Search will fix the availability problem...eventually. In the time it took me to research and write this piece, post it, then correct the typos (wrong order, I know; I plead breaking news), and then revise it due to the shockingly Microsoft-sympathetic feedback from various noders, MSN Search showed no improvement. Which is to say, 1 (search string: "Shakira") of the 8 or so searches I tried actually returned results. Maybe they didn't even implement any load balancing software for the Beta? Hard to imagine, but it would explain the problems.
Or, maybe they've decided this is just a feature / interface Beta, not a beta of their operational infrastructure. If this is the case, I feel that decision was a mistake. The right features and interface would indeed give them an edge over their competition, but only for as long as it took their competition to adapt by including the same features in their search offering. But if Microsoft can't even deliver availability, the best features and presentation on the planet (which, being who they are, they probably think they already have...) won't help.
3. Sigh. They all do it now. Copy cats.
Endnote
I've gotten a lot of feedback on this node, all of it helpful, and I've made adjustments by adding footnotes. One noder asked, the footnotes are good, why not incorporate them into the body? Normally that's good advice, but the format I chose for this piece was to quote the letter I wrote to the WSJ, then add commentary. I had to get that letter off quickly, because e-mail letters shortly after the article goes to print are more likely to be read than those that come in later. Once that letter was sent, I couldn't very well alter it, could I? Hence, footnotes. Why are there footnotes added to the non-quoted, main body text of the piece? Whimsey, or vanity I suppose. E2 is a geekly place, and on good days, we're scholarly too. I know you'll understand.
And thanks for the feedback.
Update:
As of server time Tuesday, November 16, 2004 at 0:43:56, I am STILL getting:
This site is temporarily unavailable, please check back soon.
I tried 2 times in a 5 min span, got the aforementioned "site unavailable..." message both times. I'm beginning to wonder if I will ever complete a complete browsing
session with the MSN Search Beta.
If anyone else has been able to finish a complete MSN Search Beta browsing "session" (which I suppose would be a search, followed by, say, a side-trip into the help system, then some search refinements, followed by clicking on several of the search results links -- I'd be more specific, but I've only actually managed to see their search results interface ONCE!), by all means, document your experience on this node!