Hadoop at Yahoo!

Started by hatRed, 22 July 2009, 02:38:42 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

hatRed

source : http://en.wikipedia.org/wiki/Hadoop

On February 19, 2008, Yahoo! launched what it claimed was the world's largest Hadoop production application. The Yahoo! Search Webmap is a Hadoop application that runs on a more than 10,000 core Linux cluster and produces data that is now used in every Yahoo! Web search query.[10]

There are multiple Hadoop clusters at Yahoo!, each occupying a single datacenter (or fraction thereof). No HDFS filesystems or Map/Reduce jobs are split across multiple datacenters; instead each datacenter has a separate filesystem and workload. The cluster servers run Linux, and are configured on boot using Kickstart. Every machine bootstraps the Linux image, including the Hadoop distribution. Cluster configuration is also aided through a program called ZooKeeper. Work that the clusters perform is known to include the index calculations for the Yahoo! search engine.

On June 10, 2009, Yahoo! released its own distribution of Hadoop. [11]
i'm just a mammal with troubled soul