Archive for the 'yahoo' Category

Openness, evil and reusability

I’ve stopped blogging over the last several weeks as I uprooted my family and moved to London to start my new job. But there have been some interesting things worth tracking recently I thought I might mention.

(Interestingly, Twitter usurped any blogging impulses I’ve had during the transition, but it’s time to get back into the long form dialog a little again now that we’re settling in here.)

First, I’m really pleased to see Yahoo!’s open strategy taking shape with things like SearchMonkey, Glue, and the forward-looking presentations done at Web 2.0 Expo. In my opinion, they are still underestimating the power of what Yahoo! could be doing by opening outwardly more, but the momentum is definitely in the right direction regardless of the distracting M&A discussions.

Second, I love where Umair Haque is going with his ‘Good vs Evil’ strategic thinking stuff. He’s getting into why the costs of evil are starting to outweigh the benefits in a globally networked and highly elastic economic landscape.

“As Starbucks and Wal-Mart are discovering, orthodox strategy was built for an industrial world - an equilibrium world of oligopolies, soulless “product”, and zombified “consumers”. But that’s not today’s world.”

Even better than his post, perhaps, is the comment stream which includes this insight from Mike Bonifer who compares today’s competitive landscape to the art of improvisation:

“What many do-gooders fail to acknowledge is that it is not enough to do good. One must also confront, then work artfully at marginalizing, out-witting, out-designing and out-performing the forces of evil that are afoot in the world. Forces like greed, hate, terror, racism, misunderstanding, obfuscation and fear. Heroism is only as strong as the calumny it overcomes.”

Third, I loved hearing the meaty thinking going on in the heads of Lucas Gonze and Jon Udell talking on IT Conversations. It’s as if they are both articulating Clay Shirky’s cognitive surplus view of the world through a music lens:

“Imagine that we lived in a world where all photography was the kind you see in magazines. In this world all photos are taken by professionals and all the people who got their pictures taken are models at the peak of their career. If you had your picture taken normally, you’d think you were hideously ugly. That is the musical world we grew up in, and it’s bogus. Things don’t have to be that way.”

Jon naturally moved the conversation to the problem of discoverability that has been increasingly difficult to deal with as more and more data builds out across the network. He notes some of the challenges as a consumer of interesting things and as someone who has something interesting to offer. He thinks the answer is a bit higher level than traditional syndication:

“There’s a way of publishing that allows something to flow on the network retaining its full fidelity and usability in other contexts.”

Lastly, the open data services space is getting really really interesting now as context and relevance find their way into the mix. For example, the Dash GPS formally rolled out their open service. And then a Guardian colleague pointed me to the AMEE service (”Avoiding Mass Extinctions Engine”) which finds itself being used on Dopplr and the the current Radiohead tour web site (click on ‘Carbon Calculator’).

There are tons of interesting developments unfolding, and I’m seeing all this stuff through fresh eyes again…one of the great benefits of changing jobs. I’ll do my best to keep the blogging energy up and to provide some analysis. Though I’m sure my perspective will shift a bit…to what, I really don’t know, yet.

Open source grid computing takes off

This has been fun to watch. The Hadoop team at Yahoo! is moving quickly to push the technology to reach its potential. They’ve now adopted it on one of the most important applications in the entire business, Yahoo! Search.

From the the Hadoop Blog:

The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search.

Some Webmap size data:

  • Number of links between pages in the index: roughly 1 trillion links
  • Size of output: over 300 TB, compressed!
  • Number of cores used to run a single Map-Reduce job: over 10,000
  • Raw disk used in the production cluster: over 5 Petabytes

I’m still trying to figure out what all this means, to be honest, but Jeremy Zawodny helps to break it down. In this interview, he gets some answers from Arnab Bhattacharjee (manager of the Yahoo! Webmap Team) and Sameer Paranjpye (manager of our Hadoop development):

The Hadoop project is opening up a really interesting discussion around computing scale. A few years ago I never would have imagined that the open source world would be contributing software solutions like this to the market. I don’t know why I had that perception, really. Perhaps all the positioning by enterprise software companies to discredit open source software started to sink in.

As Jeremy said, “It’s not just an experiment or research project. There’s real money on the line.

For more background on what’s going on here, check out this article by Mark Chu-Carroll “Databases are hammers; MapReduce is a screwdriver”.

This story is going to get bigger, I’m certain.

A handy music playlist tool

I’ve been looking for a way to share playlists on my blog and elsewhere online for a long time. It’s been surprisingly hard to find a really convenient way to do it.

DRM and industry lockdown have been a big part of that, but there have also been too few technical ways to point to music files that are already publicly available. There are tons of legal MP3’s on the Internet that reside at readable URLs today.

Lucas Gonze and his team at Yahoo! solved this problem. They launched a source-agnostic embeddable media player. You can read more about it on YDN.

It’s fantastically simple. All you do is paste this reference to Yahoo!’s media player javascript code anywhere on your web page (I added it at the bottom of my blog templates):

<script type=”text/javascript” src=”http://mediaplayer.yahoo.com/js”></script>

Then you just add an HTML link somewhere on your web page to any MP3 file you want to see in your playlist.

That’s it. You’re already done. The link you just made will now include a small play button in front of it, and a mini media player will appear in the browser.

Here’s a short playlist I quickly put together to show how it works. The 4th track here is particularly relevant to my life:

Cut Chemist - The Garden
Young Einstein (Ugly Duckling) - Handcuts Soul Mix
They Might Be Giants- Birdhouse in Your Soul
LCD Soundsystem - Losing My Edge

The code for that playlist looks like this:

<a href=”http://download.wbr.com/cutchemist/TheGarden.mp3″> Cut Chemist - The Garden </a>
<a href=”http://www.uglyduckling.us/music/HandCutsSoulMix.mp3″> Young Einstein (Ugly Duckling) - Handcuts Soul Mix </a>
<a href=”http://midwesternhousewives.com/mix/The%20Might%20Be%20Giants-%20Birdhouse%20in%20Your%20Soul.mp3″> They Might Be Giants- Birdhouse in Your Soul </a>
<a href=”http://www.personal.psu.edu/users/s/m/smk291/muchies/LCD%20Soundsystem%20-%20Losing%20My%20Edge.mp3″> LCD Soundsystem - Losing My Edge </a>

They’ve included some other nice things in the code that give you some flexibility. You can create a shareable playlist file, and you can add cover art, for example.

What I like most, probably, is the architecture of the solution. Anyone who already links to MP3 files can just add the music player javascript code to their page templates, and it will just work immediately. You don’t have to force fit a heavily branded HTML badge into your web page. And since the links are all standard HTML href’s, the content of the playlist is search engine friendly.

It’s the first time I’ve seen a media player so closely aligned with the way the Internet works.

Lucas posts about the need to unlock how media files are referenced. He wants to take the complexity out of distribution and reduce the concept of music sharing and discoverability to the Internet’s roots with URLs as identifiers:

“Almost all online music businesses right now are in the distribution business, even if they see other functions like discovery or social connection as their main value, because they have no way to connect their discovery or social connection features with a reliable provisioning service from a third party. But provisioning is a commodity service which doesn’t give anybody an edge. They don’t want to import playlists from third parties because *that’s* where they are adding value.

Exporting playlists for others to provision, though, is a different story, and it makes much more sense from a business perspective. Let somebody else deal with provisioning. This is what it would mean for somebody like Launchcast or Pandora to publish XSPF with portable song identifiers that could be resolved by companies that specialize in provisioning.”

It seems Lucas is thinking about how to get music flowing around the Internet with the same efficiency that text has enjoyed. Very smart.

The problem with being popular (part 2)

One of the more interesting sciences, in my mind, is how information relevance is both determined, surfaced and then evolved.

In Fred Wilson’s recent Cautionary Techmeme Tale he argues that making news popular takes away its social context and therefore becomes meaningless. He found Techmeme more useful when its sources more closely resembled his network of friends:

“For years, I’ve been using curators to filter my web experience…Techmeme has been the killer social media curator for my world of tech blogs. Lore has it that it was created using Scoble’s OPML file. It doesn’t matter to me if that’s true or not, I love that story. Because my OPML file was unusable until I found Techmeme and after that I stopped reading feeds and started reading curated feeds.”

This feeds into a larger argument about why pop culture and the art of being or becoming popular can be a bad thing. Not long ago I was inspired by the movie “Good Night and Good Luck” to dive into this idea myself:

“The real problem with popularity-driven models is that they reduce both the breadth and depth of the sources, topics and viewpoints being expressed across a community. Popularity-driven models water down the value in those hard-to-find nuggets. They normalize coverage and create new power structures that interesting things have to fight through.”

This is exactly why personalization, recommendations and social media technologies really matter. They can solve this problem of creating conformist media consumption practices by creating relevance through networks of people rather than through networks of commercial institutions.

I haven’t used My Yahoo! as much as I’d like, but there is a simple function in it that I love which could ultimately create amazing benefits for people who want a human filter for the Internet. It’s called “Top Picks”.

“The Top Picks module automatically highlights stories from your page, based on the articles you have recently read on My Yahoo! The more stories you click on, the more you will see this module reflect your interests.”

Actually, the technology beneath it is not so ’simple’ but the application of it here makes so much sense that it feels like it’s simple when you watch it work. It works by using implicit behaviors. I don’t have to tell it what I like. It learns.

If it could also show me what my social network is tapped into right now, then the experience would feel nearly complete.

Media researchers will note here that people need pop culture to feel connected to a greater whole. I believe that’s true, too. Television is an amazingly powerful community builder.

But I would gladly trade a powerful singular social voice tied together by networks of distribution ownership for a less unified but still loosely connected network of pop culture tied together by my personal activities and my social connections.

Investing in video at YDN

We’ve been playing around with video as a communications mechanism on Yahoo! Developer Network for a while now. Our casual attempts to generate interest in Yahoo! technologies through interviews, screencasts, tech talks, etc. have worked really well.

So, we hired a full time videographer/filmmaker named Ricky Montalvo and got him some decent gear to push the envelope a little further. And today we rolled out YDN Theater on the YDN web site to establish a home for all the work he has been producing.

The journey here started with a pretty lame but surprisingly successful screencast that Dan Theurer and I did to explain how browser-based authentication worked. It was blurry. We made mistakes. The subject matter was pretty abstract. And neither Dan nor I have particularly strong camera presence.

Regardless, it has been viewed over 19,000 times, so far.

We kept pushing with new types of videos such as partner showcases with people like Joyce Park, Adam Rifkin, and Leah Culver. We brought the camera to our various Hack Days and produced a particularly funny recap of the London event. And we recorded tech talks from our own staff at Yahoo! and presentations from guest speakers like Grady Booch, Joe Hewitt and David Weinberger.

By the time we found Ricky, we knew we were building a program that was going to be really interesting. Yet, we hardly spent any money other than a few cheap cameras and some basic editing tools including Camtasia at that point.

The success to date I think has been in large part due to the fact that we haven’t tried to pimp out our videos with any professional plastic gloss or staged demos. We also try to have a little fun with them. Jeremy Zawodny is a really good interviewer. His unassuming yet pointed questions get people to say things they otherwise wouldn’t include on any planned script. And the fact that the videos are raw with few cuts or edits make them feel real, too.

There are some good video program ideas floating around here that could be a lot of fun, but now we’re torn between how much time we want to spend building out the video offering and how much time we want to spend on all the other ways the team can evangelize Yahoo! technologies.

I’m not sure how to measure that decision just yet, but as long as people are consuming these shows we do with such enthusiasm we’ll probably tilt the scale in favor of doing more video whenever possible.