Tuesday, 10 January 2012

Why is Google so Valuable?


Day 12: The Importance of Filtering

Why is Google so Valuable?

If you think about what Google is, it is essentially an editorial filter for the web. They organize information & return relevant results, and as a result of that they make a boatload of cash selling ads on commercial keywords.

More data usually beats better algorithms
Most people think Google's success is due to their brilliant algorithms, especially PageRank. In reality, the two big innovations that Larry and Sergey introduced, that really took search to the next level in 1998, were:
  1. The recognition that hyperlinks were an important measure of popularity -- a link to a webpage counts as a vote for it.
  2. The use of anchortext (the text of hyperlinks) in the web index, giving it a weight close to the page title.
First generation search engines had used only the text of the web pages themselves. The addition of these two additional data sets -- hyperlinks and anchortext -- took Google's search to the next level. The PageRank algorithm itself is a minor detail -- any halfway decent algorithm that exploited this additional data would have produced roughly comparable results.
The same principle also holds true for another area of great success for Google: the AdWords keyword auction model. Overture had previously proved that the model of having advertisers bid for keywords could work. Overture ranked advertisers for a given keyword based purely on their bids. Google added some additional data: the clickthrough rate (CTR) on each advertiser's ad. Thus, to a first approximation, Google ranks advertisers by the product of their bid and their CTR (this was true in the first version of AdWords; they now use more considerations). This simple change made Google's ad marketplace much more efficient than Overture's. Notice that the algorithm itself is quite simple; it is the addition of the new data that made the difference.
To sum up, if you have limited resources, add more data rather than fine-tuning the weights on your fancy machine-learning algorithm. Of course, you have to be judicious in your choice of the data to add to your data set.



No comments:

Post a Comment