Interesting perspectives from Web 2.0 Expo

Today’s Web 2.0 Expo in San Francisco provided some really good brain food.

Clay Shirky’s keynote was excellent. He talked about architecting a new world for the “cognitive surplus” that’s emerging as people pull themselves out of the historical sitcom hangover and invest their energy online. Matt Jones and Tom Coates shared some neat ideas on design for personal infomatics. And Twitter’s Alex Payne and Michael Migurski of Stamen Design presented learnings from the perspective of an API provider.

One little nugget I really liked was a minor point Migurski made when talking through the Oakland Crimespotting service. He noted that there are several standard formats commonly provided by most web services including HTML, JSON, serliazed PHP, RSS and XML.

But we often forget about simple Excel spreadsheets.

He showed how the Oakland Crimespotting site offers downloadable Excel spreadsheets detailing recent activity from particular police beats, for example.

One of the keys to opening up government data is making the case to the people who are best equipped to provide raw data that it needs to be posted directly to the Internet. Telling them they need to output JSON for data visualizations and mashups will do as much good as a slap in the face. Showing them a regularly updating Excel spreadsheet that is findable on a web page that they can email to their colleagues, friends and families is going to get them thinking differently and perhaps encourage their participation directly.

The crime data issue is going to be a big deal in the not too distant future, I’m sure. And as Mr. Coates and Mr. Jones noted in their talks on personal data design, it’s the details that really matter in this space. You can think about products and features all day, but the specifics that define how data is shared, how it becomes relevant and how it is presented will make or break the intent of any offering.

Designing Your API, Web 2.0 Expo 2008:

My new gig at the Guardian in London

At the end of April I will be joining the Guardian in London to build a new developer program there.

This is a fantastic opportunity in many ways. Perhaps what’s most appealing to me is the direction the Guardian is going — they are totally focused on building a great online business, and it all starts with great journalism. As Jeff Jarvis reported from a management meeting there about a year ago,

“Alan Rusbridger, editor-in-chief of the Guardian, told the staff of his newspaper that now ‘all journalists work for the digital platform’ and that they should regard ‘its demands as preeminent.’…They issued a set of principles to work by. And this was surrounded by much deserved — in my biased opinion — back-patting for good journalism and innovation and, from managing director Tim Brooks and company head Carolyn McCall, for business progress.”

In addition, being owned by a trust committed to preserving the core values of journalism provides a very powerful foundation for using the Internet to offer important services for developers around the world. From the Guardian Media Group web site:

“The Trust was created in 1936 to safeguard the journalistic freedom and liberal values of the Guardian. Its core purpose is to preserve the financial and editorial independence of the Guardian in perpetuity, while its subsidiary aims are to champion its principles and to promote freedom of the press in the UK and abroad.”

With it’s history of championing data freedom, the Guardian is a great environment for opening up data that matters to people. The Guardian’s Simon Waldman points out:

“Charles Arthur and his gang have been banging their ‘Free our data’ drum for two years now. This week, under the slightly optimistic headline: In sight of victory, they cover a report which proves their case that their is more value to be created by opening up publicly owned data than by giving government agencies control over it.”

This is also a great opportunity for me, personally. I lived in London a few years ago now when I was with The Industry Standard and loved it. I met my wife and got married there and always planned to return someday. (I’m curious to see how fast my daughter’s accent changes…my wife has been trying in vain to get her to speak the ‘correct’ way. “Water is pronounced wottah, not waddr.”) And I’m looking forward to living in the same city as my brother Mitch again. Many pints to enjoy together, brother.

It’s also difficult to leave Yahoo! with all the exciting developments happening there. I came to Yahoo! in 2005 during the Flickr era when lots of people were innovating on different approaches to openness. Now, it seems, the drive toward openness is having a major impact on the company and the Internet as a whole. I’m glad I was able to at least participate in getting things moving in that direction.

At the same time, I’m really excited to be working with some outstanding people at the Guardian, some I already know, many I’ve recently met and many who I’ve only heard about still. I can’t wait to find out what other ideas are cooking in addition to our plans to open up data and services for developers.

Meantime, anyone interested in buying a nice little house in San Francisco’s Potrero Hill, please drop me a line. Oh, and let me know if you have any interest in looking after our dog (terrier/beagle mix) while he’s in the pet immigration waiting period (about 6 months).

I’ll continue to use this blog to comment on what’s going on in the online media market as I see it. I may also twitter the inane details of our move across the pond for our friends and family. So, stay tuned as this new adventure unfolds.

Creating leverage at the data layer

There’s a reason that the world fully embraced HTTP but not Gopher or Telnet or even FTP. That’s because the power of the Internet is best expressed through the concept of a network, lots of interlinked pieces that make up something bigger rather than tunnels and holes that end in a destination.

The World Wide Web captured people’s imaginations, and then everything changed.

I was reminded of this while reading a recent interview with Tim Berners-Lee (via TechCrunch). He talked a bit about the power of linking data:

“Web 2.0 is a stovepipe system. It’s a set of stovepipes where each site has got its data and it’s not sharing it. What people are sometimes calling a Web 3.0 vision where you’ve got lots of different data out there on the Web and you’ve got lots of different applications, but they’re independent. A given application can use different data. An application can run on a desktop or in my browser, it’s my agent. It can access all the data, which I can use and everything’s much more seamless and much more powerful because you get this integration. The same application has access to data from all over the place…

Data is different from documents. When you write a document, if you write a blog, you write a poem, it is the power of the spoken word. And even if the website adds a lot of decoration, the really important thing is the spoken words. And it is one brain to another through these words.”

Data is what matters. It’s a point of interest in a larger context. It’s a vector and a launchpad to other paths. It’s the vehicle for leverage for a business on the Internet.

What’s the business strategy at the data layer?

I have mixed views on where the value is on social networks and the apps therein, but they are all showing where the opportunity is for services that have actually useful data. Social networks are a good user interface for distributed data, much like web browsers became a good interface for distributed documents.

But it’s not the data consumption experience that drives value, in my mind.

Value on the Internet is being created in the way data is shared and linked to more data. That value comes as a result of the simplicity and ease of access, in the completeness and timeliness, and by the readability of that data.

It’s not about posting data to a domain and figuring out how to get people there to consume it. It’s about being the best data source or the best data aggregator no matter how people make use of it in the end.

Where’s the money?

Like most Internet service models, there’s always the practice of giving away the good stuff for free and then upselling paid services or piggybacking revenue-generating services on the distribution of the free stuff. Chris Anderson’s Wired article on the future of business presents the case well:

“The most common of the economies built around free is the three-party system. Here a third party pays to participate in a market created by a free exchange between the first two parties…what the Web represents is the extension of the media business model to industries of all sorts. This is not simply the notion that advertising will pay for everything. There are dozens of ways that media companies make money around free content, from selling information about consumers to brand licensing, “value-added” subscriptions, and direct ecommerce. Now an entire ecosystem of Web companies is growing up around the same set of models.”

Yet these markets and technologies are still in very early stages. There’s lots of room for someone to create an open advertising marketplace for information, a marketplace where access to data can be obtained in exchange for ad inventory, for example.

Data providers and aggregators have a huge opportunity in this world if they can become authoritative or essential for some type of useful information. With that leverage they could have the social networks, behavioral data services and ad networks all competing to piggyback on their data out across the Internet to all the sites using or contributing to that data.

Regardless of the specific revenue method, the businesses that become a dependency in the Web of data of the future will also find untethered growth opportunities. The cost of that type of business is one of scale, a much more interesting place to be than one that must fight for attention.

I’ve never really liked the “walled garden” metaphor and its negative implications. I much prefer to think in terms of designing for growth.

Frank Lloyd Wright designed buildings that were engaged with the environments in which they lived. Similarly, the best services on the World Wide Web are those that contribute to the whole rather than compete with it, ones that leverage the strengths in the network rather than operate in isolation. Their existence makes the Web better as a whole.

Photo: happy via

What would happen if the Internet knew where you were?

Tom Coates took the stage today at ETech to announce the developer availability of Fire Eagle.

Fire Eagle is a location storage service. You tell Fire Eagle where you are, and then Fire Eagle can act as your location broker for other services that might want your location information.

It’s like PayPal for your location.

When I asked Tom to explain what Fire Eagle was he replied, “What got me excited about Fire Eagle was the idea that the Internet might be really interesting if it knew where I was.” The video of his ETech presentation is here:

People have been talking about how the advertising model would change in a location-aware world for years. There are countless scenarios for improving the way marketers can talk to people if they know where they are at a particular moment in time.

Social networking is an obvious winner, as well. If services like Facebook or MySpace knew where your friends were that would certainly create some interesting new ways to interact with people.

Every day tasks could change dramatically, too.

Let’s say you need gas for the car. You pull up the handy local gas ticker on your phone which shows the nearest stations and compares prices.

Then maybe you decide to go for a coffee…Are any of your friends out and about? Ping Fire Eagle. You learn that an ex-girlfriend is at the local cafe around the corner, so you go to Starbucks instead.

Now, not everyone has a GPS or wifi-enabled device. And developers will require a little time before they uncover the best uses for this kind of interaction model. However, there are already a few partners working on neat integrations, like Dopplr, for example. And Erica Sadun already built an iPhone hack that will automatically ping Fire Eagle with your location.

Online media today is less about hosting web sites that push out HTML pages every day. It’s real power is derived from treating media as a service or rather about helping data find data. Fire Eagle is a great model of this world.

Fire Eagle has to be one of the most promising applications to come along in a while, in my opinion.

A big congrats to the Fire Eagle team!

GPS device + data feeds + social = awesome service

One of the most interesting market directions in recent months in my mind is the way the concept of a location service is evolving. People are using location as a vector to bring information that matters directly to them. A great example of this is Dash.net.

Dash is a GPS device that leverages the activity of its user base and the wider local data pools on the Internet to create a more personalized driving experience. Ricky Montalvo and I interviewed them for the latest Developer Spotlight on YDN Theater:

Of particular note are the ways that Dash pulls in external data sources from places like Yahoo! Pipes. Any geoRSS feed can be used to identify relevant locations near you or near where you’re going directly from the device. They give the example of using a Surfline.com feed built with Pipes to identify surfing hot spots at any given moment. You can drive to Santa Cruz and then decide which beach to hit once you get there.

There are other neat ways to use the collaborative user data such as the traffic feedback loop so that you can choose the fastest route to a destination in real time. And the integration with the Yahoo! Local and the Upcoming APIs make for great discoveries while you’re out and about.

You can also see an early demo of their product which they showed at Web 2.0 Summit in the fall:

The way they’ve opened up a hardware device to take advantage of both the information on the Internet and the behaviors of its customers is really innovative, not to mention very useful, too. I think Dash is going to be one to watch.

Open source grid computing takes off

This has been fun to watch. The Hadoop team at Yahoo! is moving quickly to push the technology to reach its potential. They’ve now adopted it on one of the most important applications in the entire business, Yahoo! Search.

From the the Hadoop Blog:

The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search.

Some Webmap size data:

  • Number of links between pages in the index: roughly 1 trillion links
  • Size of output: over 300 TB, compressed!
  • Number of cores used to run a single Map-Reduce job: over 10,000
  • Raw disk used in the production cluster: over 5 Petabytes

I’m still trying to figure out what all this means, to be honest, but Jeremy Zawodny helps to break it down. In this interview, he gets some answers from Arnab Bhattacharjee (manager of the Yahoo! Webmap Team) and Sameer Paranjpye (manager of our Hadoop development):

The Hadoop project is opening up a really interesting discussion around computing scale. A few years ago I never would have imagined that the open source world would be contributing software solutions like this to the market. I don’t know why I had that perception, really. Perhaps all the positioning by enterprise software companies to discredit open source software started to sink in.

As Jeremy said, “It’s not just an experiment or research project. There’s real money on the line.

For more background on what’s going on here, check out this article by Mark Chu-Carroll “Databases are hammers; MapReduce is a screwdriver”.

This story is going to get bigger, I’m certain.

Targeting ads at the edge, literally

Esther Dyson wrote about a really interesting area of the advertising market in an article for The Wall Street Journal.

She’s talking about user behavior data arbiters, companies that capture what users are doing on the Internet through ISPs and sell that data to advertisers.

These companies put tracking software between the ISP and a user’s HTTP requests. They then build dynamic and anonymous profiles for each user. NebuAd, Project Rialto, Phorm, Frontporch and Adzilla are among several companies competing for space on ISPs’ servers. And there’s no shortage of ad networks who will make use of that data to improve performance.

Esther gives an example:

“Take user number 12345, who was searching for cars yesterday, and show him a Porche ad. It doesn’t matter if he’s on Yahoo! or MySpace today — he’s the same number as yesterday. As an advertiser, would you prefer to reach someone reading a car review featured on Yahoo! or someone who visited two car-dealer sites yesterday?”

Behavioral and demographic targeting is going to become increasingly important this year as marketers shift budgets away from blanket branding campaigns toward direct response marketing. Over the next few years advertisers plan to spend more on behavioral, search, geographic, and demographic targeting, in that order, according to Forrester. AdWeek has been following this trend:

“According to the Forrester Research report, marketer moves into areas like word of mouth, blogging and social networking will withstand tightened budgets. In contrast, marketers are likely to decrease spending in traditional media and even online vehicles geared to building brand awareness.”

We tried behavioral targeting campaigns back at InfoWorld.com with mild success using Tacoda. The main problem was traffic volume. Though performance was better than broad content-targeted campaigns, the target segments were too small to sell in meaningful ways. The idea of an open exchange for auctioning inventory might have helped, but at the time we had to sell what we called “laser targeting” in packages that started to look more like machine gun fire.

This “edge targeting” market, for lack of a better term, is very compelling. It captures data from a user’s entire online experience rather than just one web site. When you know what a person is doing right now you can make much more intelligent assumptions about their intent and, therefore, the kinds of things they might be more interested in seeing.

It’s important to emphasize that edge targeting doesn’t need to know anything personally identifiable about a person. ISP’s legally can’t watch what known individuals are doing online, and they can’t share anything they know about a person with an advertiser. AdWeek discusses the issue of advertising data optimization in a report title “The New Gold Standard“:

“As it stands now, consumers don’t have much control over their information. Direct marketing firms routinely buy and sell personal data offline, and online, ad networks, search engines and advertisers collect reams of information such as purchasing behavior and Web usage. Google, for instance, keeps consumers’ search histories for up to two years, not allowing them the option of erasing it.

Legalities, however, preclude ad networks from collecting personally identifiable information such as names and addresses. Ad networks also allow users to opt out of being tracked.”

Though a person is only identified as a number in edge targeting, that number is showing very specific intent. That intent, if profiled properly, is significantly more accurate than a single search query at a search engine.

I suspect this is going to be a very important space to watch in the coming years.

Local news is going the wrong way

Google’s new Local News offering misses the point entirely.

As Chris Tolles points out, Topix.net and others have been doing exactly this for years. Agregating information at the hyperlocal level isn’t just about geotagging information sources. Chris explains why they added forums:

“…there wasn’t enough coverage by the mainstream or the blogosphere…the real opportunity was to become a place for people to publish commentary and stories.”

He shouldn’t worry about Google, though. He should worry more about startups like Outside.in who upped the ante by adding a slightly more social and definitely more organic experience to the idea of aggregating local information.

Yet information aggregation still only dances around the real issue.

People want to know what and who are around them right now.

The first service that really nails how we identify and surface the things that matter to us when and where we want to know about them is going to break ground in a way we’ve never seen before on the Internet.

We’re getting closer and closer to being able to connect the 4 W’s: Who, What, Where and When. But those things aren’t yet connecting to expose value to people.

I think a lot of people are still too focused on how to aggregate and present data to people. They expect people to do the work of knowing what they’re looking for, diving into a web page to find it and then consuming what they’ve worked to find.

There’s a better way. When services start mixing and syndicating useful data from the 4 W vectors then we’ll start seeing information come to people instead.

And there’s no doubt that big money will flow with it.

Dave Winer intuitively noted, “Advertising will get more and more targeted until it disappears, because perfectly targeted advertising is just information. And that’s good!”

I like that vision, but there’s more to it.

When someone connects the way information surfaces for people and the transactions that become possible as a result, a big new world is going to emerge.

How to launch an online platform (part II)

The MySpace guys won the latest launch party battle. About 200 people met at the new MySpace house last night in San Francisco to see what the company was going to do to compete with Facebook on the developer front.

MySpace FlipThey had a fully catered event including an open bar with some good whiskey. The schwag bag included the Flip digital video camera (wow!). There were a small handful of very basic demos on the floor from the usual suspects (Slide, iLike, Flixster, etc.). And the presentation was short and sweet so we could get back to socializing.

Nicely executed.

The party wasn’t without flaw, mind you.

First, the date. Why throw a launch party on the same day as the biggest political event in our time, Super Tuesday? The headlines were on everything but the MySpace launch. The right people knew what was going on, but the impact was severely muted. I was somewhat eager to leave to find out what was happening out there in the real world.

Second, the presentation. You have to appreciate them keeping it super short. Once the drinks start flowing, it gets very hard to keep people quiet for more than a few minutes. But I think most everyone there was actually very interested in hearing something meaty or a future vision or something. Bullets on a powerpoint rarely impress.

Neither of those things really mattered, in the end. The party served its purpose.

It also occurred to me afterward that it would have been a shame if the co-founders and executive team weren’t there. But they were very much in this and made themselves accessible to chat. This isn’t a sideshow move for MySpace. It matters to them.

Contrast this with the standard formula followed by the Bebo guys, and you can see why MySpace does so well in social networking. They embody it as a company.

Now, whether or not they can raise the bar on app quality or improve on distribution for apps is yet to be seen. By giving developers a month to get their submissions in for the end-user roll out they are resetting the playing field. That’s great. But I’m not sure whether the MySpace user experience will encourage the sharing of apps as fluidly as the FaceBook UE. I don’t use it enough to know, to be honest.

As far as the platform itself goes, I’m curious about the impact the REST API will have. I’ve wondered how the social networks would make themselves more relevant in the context of the world outside the domain.

Will the REST API be used more by services that want to expose more data within MySpace or by services that want to leverage the MySpace data in their own environments outside myspace.com? I suspect the latter will matter more over time but that won’t mean anything until people adopt the apps.

Overall, good show. This should help bring back some of the MySpace cool that was lost the last year or so.

Preview of the del.icio.us publisher api

I just posted a short screencast on the YDN blog of the cool new publisher api coming from del.icio.us soon. I’ve also embedded the video below. Lots of interesting possibilities with this new service, for sure.

Embed video:
“>