Archive for the 'data' Category

Local community data reporting

EveryBlock has taken a very data intensive look at local news reporting. As founder Adrain Holovaty explains:

“An overall goal of EveryBlock is to point you to news near your block. We’ve been working hard to do a good job of this so far by accumulating public records, cataloging newspaper stories and pulling together various other geographic information from the Web.”

This generally takes the form of raw data points placed on maps. They recently rolled out a variation on the theme by using topic-specific data which adds more context to the local news reporting idea.

“A week or so ago, 15 people were arrested on bribery charges as part of a federal probe into corruption in Chicago city government. We’ve analyzed U.S. Attorney Patrick J. Fitzgerald’s complaint documents and cataloged the specific addresses mentioned within. On the project’s front page, you can view every location we found, along with a relevant excerpt from the complaint. You can sort this data in various ways, including a list and map of all the alleged bribe locations.”

This is the type of value that’s otherwise kind of missing from the experience. Rather than providing a mostly pure research tool, the site now gives some insight and perspective with an editorial view on the data. In this case, the data is telling a story that otherwise might seem a little distant to you until you see how the issue may in fact be a very real one right in your backyard, so to speak.

But it occurred to me that the community is probably even better able to capture and share this level of useful insight. It would be really neat to see EveryBlock open the reporting and mapping process so that anyone who has an interest in exposing the trends in their neighborhood or elsewhere had a platform to do so.

Average payment (€) by Area
Similar to the way Swivel allows you to collect data in spreadsheet form, visualize it and then share it the way Flickr and YouTube allow you to share, EveryBlock could provide an environment for individuals to do the reporting in their neighborhood that matters to them. The wider community could then benefit from the work of a few, and suddenly you have a really powerful local news vehicle.

This isn’t necessarily in contrast to the approach Outside.in has taken by aggregating shared information from around the web, but it certainly puts some structure around it in a way that may be necessary.

Managing a community is a very different problem than aggregating and presenting useful local data. But I wonder if it’s a necessary next step to get both of these fledgling but very forward-thinking local media services closer to critical mass.

Interesting perspectives from Web 2.0 Expo

Today’s Web 2.0 Expo in San Francisco provided some really good brain food.

Clay Shirky’s keynote was excellent. He talked about architecting a new world for the “cognitive surplus” that’s emerging as people pull themselves out of the historical sitcom hangover and invest their energy online. Matt Jones and Tom Coates shared some neat ideas on design for personal infomatics. And Twitter’s Alex Payne and Michael Migurski of Stamen Design presented learnings from the perspective of an API provider.

One little nugget I really liked was a minor point Migurski made when talking through the Oakland Crimespotting service. He noted that there are several standard formats commonly provided by most web services including HTML, JSON, serliazed PHP, RSS and XML.

But we often forget about simple Excel spreadsheets.

He showed how the Oakland Crimespotting site offers downloadable Excel spreadsheets detailing recent activity from particular police beats, for example.

One of the keys to opening up government data is making the case to the people who are best equipped to provide raw data that it needs to be posted directly to the Internet. Telling them they need to output JSON for data visualizations and mashups will do as much good as a slap in the face. Showing them a regularly updating Excel spreadsheet that is findable on a web page that they can email to their colleagues, friends and families is going to get them thinking differently and perhaps encourage their participation directly.

The crime data issue is going to be a big deal in the not too distant future, I’m sure. And as Mr. Coates and Mr. Jones noted in their talks on personal data design, it’s the details that really matter in this space. You can think about products and features all day, but the specifics that define how data is shared, how it becomes relevant and how it is presented will make or break the intent of any offering.

Designing Your API, Web 2.0 Expo 2008:

Creating leverage at the data layer

There’s a reason that the world fully embraced HTTP but not Gopher or Telnet or even FTP. That’s because the power of the Internet is best expressed through the concept of a network, lots of interlinked pieces that make up something bigger rather than tunnels and holes that end in a destination.

The World Wide Web captured people’s imaginations, and then everything changed.

I was reminded of this while reading a recent interview with Tim Berners-Lee (via TechCrunch). He talked a bit about the power of linking data:

“Web 2.0 is a stovepipe system. It’s a set of stovepipes where each site has got its data and it’s not sharing it. What people are sometimes calling a Web 3.0 vision where you’ve got lots of different data out there on the Web and you’ve got lots of different applications, but they’re independent. A given application can use different data. An application can run on a desktop or in my browser, it’s my agent. It can access all the data, which I can use and everything’s much more seamless and much more powerful because you get this integration. The same application has access to data from all over the place…

Data is different from documents. When you write a document, if you write a blog, you write a poem, it is the power of the spoken word. And even if the website adds a lot of decoration, the really important thing is the spoken words. And it is one brain to another through these words.”

Data is what matters. It’s a point of interest in a larger context. It’s a vector and a launchpad to other paths. It’s the vehicle for leverage for a business on the Internet.

What’s the business strategy at the data layer?

I have mixed views on where the value is on social networks and the apps therein, but they are all showing where the opportunity is for services that have actually useful data. Social networks are a good user interface for distributed data, much like web browsers became a good interface for distributed documents.

But it’s not the data consumption experience that drives value, in my mind.

Value on the Internet is being created in the way data is shared and linked to more data. That value comes as a result of the simplicity and ease of access, in the completeness and timeliness, and by the readability of that data.

It’s not about posting data to a domain and figuring out how to get people there to consume it. It’s about being the best data source or the best data aggregator no matter how people make use of it in the end.

Where’s the money?

Like most Internet service models, there’s always the practice of giving away the good stuff for free and then upselling paid services or piggybacking revenue-generating services on the distribution of the free stuff. Chris Anderson’s Wired article on the future of business presents the case well:

“The most common of the economies built around free is the three-party system. Here a third party pays to participate in a market created by a free exchange between the first two parties…what the Web represents is the extension of the media business model to industries of all sorts. This is not simply the notion that advertising will pay for everything. There are dozens of ways that media companies make money around free content, from selling information about consumers to brand licensing, “value-added” subscriptions, and direct ecommerce. Now an entire ecosystem of Web companies is growing up around the same set of models.”

Yet these markets and technologies are still in very early stages. There’s lots of room for someone to create an open advertising marketplace for information, a marketplace where access to data can be obtained in exchange for ad inventory, for example.

Data providers and aggregators have a huge opportunity in this world if they can become authoritative or essential for some type of useful information. With that leverage they could have the social networks, behavioral data services and ad networks all competing to piggyback on their data out across the Internet to all the sites using or contributing to that data.

Regardless of the specific revenue method, the businesses that become a dependency in the Web of data of the future will also find untethered growth opportunities. The cost of that type of business is one of scale, a much more interesting place to be than one that must fight for attention.

I’ve never really liked the “walled garden” metaphor and its negative implications. I much prefer to think in terms of designing for growth.

Frank Lloyd Wright designed buildings that were engaged with the environments in which they lived. Similarly, the best services on the World Wide Web are those that contribute to the whole rather than compete with it, ones that leverage the strengths in the network rather than operate in isolation. Their existence makes the Web better as a whole.

Photo: happy via

Interactive journalism: An amazing homicide mashup

I had the pleasure of interviewing Sean Connelly and Katy Newton for YDN Theater recently with YDN videographer Ricky Montalvo. They created the amazing (and award-winning) crime data mashup Not Just A Number in partnership with The Oakland Tribune.

Not Just A NumberAfter getting tired of watching the homicide count for 2006 climb higher and higher, they decided to humanize the issue and talk to the families of the victims directly. They wanted to expose the story beneath the number and give a platform upon which the community could make the issue real.

Statistics can tell effective stories, but death and loss reach emotional depths beyond the power of any numerical exploration.

Sean and Katy posted recordings of the families talking about the sons, daughters, sisters and brothers that they lost. They integrated family photos, message boards, articles and more along with the interactive homicide map on the site to round out the experience making it much more human than the traditional crime data mashup.

Here is the video (7 min.):

I also asked them if they had trouble getting data to make the site, and they said the Oakland Tribune staff were very supportive. There weren’t any usable open data sets coming out of the city, so they had to collect and enter everything themselves.

This, of course, is a very manual process. Given the challenge of getting the data Sean and Katy didn’t see how the idea could possibly scale outside of the city of Oakland.

SOmebody needs to take that on as a challenge.

I’m hopeful that efforts like Not Just A Number and the Open Government Data organization will be able to surface why it’s important for our government to open up access to the many data repositories they hold. And if the government won’t do it, then it should be the job of journalists and media companies to surface government data so that people can use it in meaningful ways.

This is a great example of how the Internet can empower people who otherwise have no voice or audience despite having profound stories to tell.

Building markets out of data

I’m intrigued by the various ways people view ‘value’. There seem to be 2 camps: 1) people who view the world in terms of competition for finite resources and 2) people who see ways to create new forms of value and to grow the entire pie.

Umair Haque talks about choices companies make that push them into one of those 2 camps. He often argues that the market needs more builders than winners. He clarifies his position in his post The Economics of Evil:

“When you’re evil, your ability to co-create value implodes: because you make moves which are focused on shifting costs and extracting value, rather than creating it. …when you’re evil, the only game you want to - or can play - is domination.”

I really like the idea that the future of the media business is in the way we build value for all constituencies rather than the way we extract value from various parts of a system. It’s not about how you secure marketshare, control distribution, mitigate risk or reduce costs. It’s about how you enable the creation of value for all.

He goes on to explain how media companies often make the mistake of focusing on data ownership:

“Data isn’t the value. In fact, data’s a commodity…What is valuable are the things that create data: markets, networks, and communities.

Google isn’t revolutionizing media because it “owns the data”. Rather, it’s because Google uses markets and networks to massively amplify the flow of data relative to competitors.”

I would add that it’s not just the creation of valuable data that matters but also in the way people interface with existing data. Scott Karp’s excellent post on the guidelines for transforming media companies shares a similar view:

“The most successful media companies will be those that learn to how build networks and harness network effects. This requires a mindset that completely contradicts traditional media business practices. Remember, Google doesn’t own the web. It doesn’t control the web. Google harnesses the power of the web by analyzing how websites link to each other.”