GPS device + data feeds + social = awesome service

One of the most interesting market directions in recent months in my mind is the way the concept of a location service is evolving. People are using location as a vector to bring information that matters directly to them. A great example of this is Dash.net.

Dash is a GPS device that leverages the activity of its user base and the wider local data pools on the Internet to create a more personalized driving experience. Ricky Montalvo and I interviewed them for the latest Developer Spotlight on YDN Theater:

Of particular note are the ways that Dash pulls in external data sources from places like Yahoo! Pipes. Any geoRSS feed can be used to identify relevant locations near you or near where you’re going directly from the device. They give the example of using a Surfline.com feed built with Pipes to identify surfing hot spots at any given moment. You can drive to Santa Cruz and then decide which beach to hit once you get there.

There are other neat ways to use the collaborative user data such as the traffic feedback loop so that you can choose the fastest route to a destination in real time. And the integration with the Yahoo! Local and the Upcoming APIs make for great discoveries while you’re out and about.

You can also see an early demo of their product which they showed at Web 2.0 Summit in the fall:

The way they’ve opened up a hardware device to take advantage of both the information on the Internet and the behaviors of its customers is really innovative, not to mention very useful, too. I think Dash is going to be one to watch.

Open source grid computing takes off

This has been fun to watch. The Hadoop team at Yahoo! is moving quickly to push the technology to reach its potential. They’ve now adopted it on one of the most important applications in the entire business, Yahoo! Search.

From the the Hadoop Blog:

The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search.

Some Webmap size data:

  • Number of links between pages in the index: roughly 1 trillion links
  • Size of output: over 300 TB, compressed!
  • Number of cores used to run a single Map-Reduce job: over 10,000
  • Raw disk used in the production cluster: over 5 Petabytes

I’m still trying to figure out what all this means, to be honest, but Jeremy Zawodny helps to break it down. In this interview, he gets some answers from Arnab Bhattacharjee (manager of the Yahoo! Webmap Team) and Sameer Paranjpye (manager of our Hadoop development):

The Hadoop project is opening up a really interesting discussion around computing scale. A few years ago I never would have imagined that the open source world would be contributing software solutions like this to the market. I don’t know why I had that perception, really. Perhaps all the positioning by enterprise software companies to discredit open source software started to sink in.

As Jeremy said, “It’s not just an experiment or research project. There’s real money on the line.

For more background on what’s going on here, check out this article by Mark Chu-Carroll “Databases are hammers; MapReduce is a screwdriver”.

This story is going to get bigger, I’m certain.