Human vs Machine: What’s Better In Search?

The next few months should be interesting to watch: Monday, Wikia Search goes online. So there we have another powerful player in the next wave of search engine wars.

For the last few years, Google with its (mostly) machine-based search algorithms has been the dominant player in the search market, producing more or less the best results by exploiting the inherent value of hyperlinks: If website authors or bloggers link to another website, so the basic idea, they endorse that website, i.e. they consider it relevant in one way or another.

Now the humans are pushing their way back into search: In 2007, Jason Calacanis’ Mahalo introduced a completely human-based search, producing great results, but only covering a relatively small number of search terms. (For terms that aren’t listed, Mahalo forwards to a Google search.) Robert Scoble already suspects that Mahalo, Techmeme and Facebook (i.e. search based on your social graph) will kick Google’s butt.

Monday, Jimmy Wales’ Wikia Search will go into public beta. Wikia Search aims at making the search algorithms open and transparent, so the black box that is Google won’t be as easily manipulated by SEO efforts.

What those projects have in commons is, as Tim O’Reilly points out, that “both are trying to re-draw the boundary between human and machine.” How this hybrid works out will determine both the quality of our search results (and thereby the way we perceive a great many things around us) and also our defense against spam.

By the way, even Google doesn’t completely rely on machines alone, but has to manually intervene with some search terms:

(…) there is a small percentage of Google pages that dramatically demonstrate human intervention by the search quality team. As it turns out, a search for “O’Reilly” produces one of those special pages. Driven by PageRank and other algorithms, my company, O’Reilly Media, used to occupy most of the top spots, with a few for Bill O’Reilly, the conservative pundit. It took human intervention to get O’Reilly Auto Parts, a Fortune-500 company, onto the first page of search results. There’s a special split-screen format for cases like this.

So why is this necessary if there is such a powerful algorithm? Writes Cory Doctorow:

The idea of a ranking algorithm is that it produces “good results” — returns the best, most relevant results based on the user’s search terms. We have a notion that the traditional search engine algorithm is “neutral” — that it lacks an editorial bias and simply works to fulfill some mathematical destiny, embodying some Platonic ideal of “relevance.” Compare this to an “inorganic” paid search result of the sort that Altavista used to sell. But ranking algorithms are editorial: they embody the biases, hopes, beliefs and hypotheses of the programmers who write and design them.

So where Google puts its money on math-fu, and Mahalo on editorial filters, Wikia Search focuses on transparency and a Wikipedia-inspired community model to open up that Google black box. What hybrid will bring us the best results and decide which information we’re going to see? 2008 will be the year that tells us. Let the battle begin!

3 Comments

It will be interesting to see how it all plays out. Many pundits have conceded an 80% share perpetually to Google, but for anyone to draw definitive conclusions about a 9 year-old company in a 15 year-old industry ignores history. Innovation has consistently proven its ability to topple a dominant market position, as General Motors, AT&T, IBM, Xerox, AOL, and Yahoo, to name but a few, can attest. Surveys consistently conclude that most Internet users cannot find credible and comprehensive information on the Internet. Yahoo recently published its own survey that suggested that 85% of searches fail to produce the desired information, and coined the term “search engine fatigue.” And while many Internet users claim to be satisfied with search results, Pew Internet has called them “trusting and naive.” Algorithms, search personalization, artificial intelligence, and the semantic web are all buzz words that describe Orwellian efforts to eliminate the need for human beings to use their own intelligence and judgment to find, critically evaluate and utilize information. Human intelligence and judgment have an enviable track record that has lasted for a fair bit more than 15 years. And I’ll always take the underdog with a solid track record over the neophyte riding the crest of popular wisdom.

Leave a Reply