Archive for the 'data' Category

The useful convergence of data

I have only one prediction for 2008. I think we’re finally about to see the useful combination of the 4 W’s - Who, What, Where, and When.

Marc Davis has done some interesting research in this area at Yahoo!, and Bradley Horowitz articulated how he sees the future of this space unfolding in a BBC article in June ‘07:

“We do a great job as a culture of “when”. Using GMT I can say this particular moment in time and we have a great consensus about what that means…We also do a very good job of “where” - with GPS we have latitude and longitude and can specify a precise location on the planet…The remaining two Ws - we are not doing a great job of.”

I’d argue that the social networks are now really honing in on “who”, and despite having few open standards for “what” data (other than UPC) there is no shortage of “what” data amongst all the “what” providers. Every product vendor has their own version of a product identifier or serial number (such as Amazon’s ASIN, for example).

We’ve seen a lot of online services solving problems in these areas either by isolating specific pieces of data or combining the data in specific ways. But nobody has yet integrated all 4 in a meaningful way.


Jeff Jarvis’ insightful post on social airlines starts to show how these concepts might form in all kinds of markets. When you’re traveling it makes a lot of sense to tap into “who” data to create compelling experiences that will benefit everyone:

  • At the simplest level, we could connect while in the air to set up shared cab rides once we land, saving passengers a fortune.
  • We can ask our fellow passengers who live in or frequently visit a destination for their recommendations for restaurants, things to do, ways to get around.
  • We can play games.
  • What if you chose to fly on one airline vs. another because you knew and liked the people better? What if the airline’s brand became its passengers?
  • Imagine if on this onboard social network, you could find people you want to meet - people in the same business going to the same conference, people of similar interests, future husbands and wives - and you can rendezvous in the lounge.
  • The airline can set up an auction marketplace for at least some of the seats: What’s it worth for you to fly to Berlin next Wednesday?

Carrying the theme to retail markets, you can imagine that you will walk into H&M and discover that one of your first-degree contacts recently bought the same shirt you were about to purchase. You buy a different one instead. Or people who usually buy the same hair conditioner as you at the Walgreen’s you’re in now are switching to a different hair conditioner this month. Though this wouldn’t help someone like me who has no hair to condition.

Similarly, you can imagine that marketing messages could actually become useful in addition to being relevant. If CostCo would tell me which of the products I often buy are on sale as I’m shopping, or which of the products I’m likely to need given what they know about how much I buy of what and when, then my loyalty there is going to shoot through the roof. They may even be able to identify that I’m likely buying milk elsewhere and give me a one-time coupon for CostCo milk.

Bradley sees it playing out on the phone, too:

“On my phone I see prices for a can of soup in my neighbourhood. It resolves not only that particular can of soup but knows who I am, where I am and where I live and helps me make an intelligent decision about whether or not it is a fair price.

It has to be transparent and it has to be easy because I am not going to invest a lot of effort or time to save 13 cents.”

It may be unrealistic to expect that this trend will explode in 2008, but I expect it to at least appear in a number of places and inspire future implementations as a result. What I’m sure we will see in 2008 is dramatic growth in the behind-the-scenes work that will make this happen, such as the development and customization of CRM-like systems.

Lots of companies have danced around these ideas for years, but I think the ideas and the technologies are finally ready to create something real, something very powerful.

Photo: SophieMuc

The Internet’s secret sauce: surfacing coincidence

What is it that makes my favorite online services so compelling? I’m talking about the whole family of services that includes Dopplr, Wesabe, Twitter, Flickr, and del.icio.us among others.

I find it interesting that people don’t generally refer to any of these as “web sites”. They are “services”.

I was fortunate enough to spend some time with Dopplr’s Matt Biddulph and Matt Jones last week while in London where they described the architecture of what they’ve built in terms of connected data keys. The job of Dopplr, Mr. Jones said, was to “surface coincidence”.

I think that term slipped out accidentally, but I love it. What does it mean to “surface coincidence”?

It starts by enabling people to manufacture the circumstances by which coincidence becomes at least meaningful if not actually useful. Or, as Jon Udell put it years ago now when comparing Internet data signals to cellular biology:

“It looks like serendipity, and in a way it is, but it’s manufactured serendipity.”

All these services allow me to manage fragments of my life without requiring burdensome tasks. They all let me take my data wherever I want. They all enhance my data by connecting it to more data. They all make my data relevant in the context of a larger community.

When my life fragments are managed by an intelligent service, then that service can make observations about my data on my behalf.

Dopplr can show me when a distant friend will be near and vice versa. Twitter can show me what my friends are doing right now. Wesabe can show me what others have learned about saving money at the places where I spend my money. Among many other things Flickr can show me how to look differently at the things I see when I take photos. And del.icio.us can show me things that my friends are reading every day.

There are many many behaviors both implicit and explicit that could be managed using this formula or what is starting to look like a successful formula, anyhow. Someone could capture, manage and enhance the things that I find funny, the things I hate, the things at home I’m trying to get rid of, the things I accomplished at work today, the political issues I support, etc.

But just collecting, managing and enhancing my life fragments isn’t enough. And I think what Matt Jones said is a really important part of how you make data come to life.

You can make information accessible and even fun. You can make the vast pool feel manageable and usable. You can make people feel connected.

And when you can create meaning in people’s lives, you create deep loyalty. That loyalty can be the foundation of larger businesses powered by advertising or subscriptions or affiliate networks or whatever.

The result of surfacing coincidence is a meaningful action. And those actions are where business value is created.

Wikipedia defines coincidence as follows:

“Coincidence is the noteworthy alignment of two or more events or circumstances without obvious causal connection.”

This is, of course, similar and related to the definition of serendipity:

“Serendipity is the effect by which one accidentally discovers something fortunate, especially while looking for something else entirely.”

You might say that this is a criteria against which any new online service should be measured. Though it’s probably so core to getting things right that every other consideration in building a new online service needs to support it.

It’s probably THE criteria.

Making government more useful through data

A very interesting working group formed recently to drive better transparency in government through data. The Open Government Data organization has a simple aim:

“The group is offering a set of fundamental principles for open government data. By embracing the eight principles, governments of the world can become more effective, transparent, and relevant to our lives.”

They proposed that data will be considered open if it complies with the following qualifications:

1. Complete
2. Primary
3. Timely
4. Accessible
5. Machine processable
6. Non-discriminatory
7. Non-proprietary
8. License-free

This is a promising approach to driving high impact changes in the way government serves its people. Giving everyone greater access to relevant information that they already own is a noble pursuit.

I’ve explored this a little myself in some investigations of access to crime data [1, 2].

It’s no surprise that Adrian Holovaty of the Chicago Crime mashup fame (and now Every Block) is one of the founding members. Of course, there’s no better advocate for the free flow of information than Lawrence Lessig. And Tim O’Reilly will be a strong foundational force here. I’d love to see Jon Udell join, too, as his work has inspired a lot of people (myself included) to think differently about exposing and sharing data like this.

Good luck, guys!

Oakland Trib’s Not-Just-A-Number improves on crime data visualization

OJR’s Jim Wayne dives into Oakland Tribune’s “Not Just A Number” web site. The service won the Service Journalism Award from ONA for an amazingly powerful view of crime data.

The basic premise was to create a data visualization for Oakland homicide crime data that made the victims and, more importantly, the people in their lives real participants in the story rather than pure statistics (or just plain ignored entirely).

It’s a very powerful site and a model for all local newspapers to follow. It’s disappointing but no surprise the media creates these kinds of community services before local governments do. At least we’re getting more access to crime data.

Wayne also points to a crime data visualization from the Los Angeles Times called The Homicide Map that I wasn’t aware of.

They have a nice map mashup that takes a more statistical approach, yet they also include things like images of the victims.

Unfortunately, as Oakland Tribune producers Katy Newton and Sean Connelley point out, a mug shot is not a fair image to use for a violent crime victim in a statistical map. But I’m glad to see them exposing data that needs to be shared.

732 homicides in Los Angeles so far in 2007! Unbelievable.

Data dynamics: How the rules of sharing are changing

Today it’s easy to store and share my pictures, my favorite URLs, my thoughts and lots of other things online. There are a range of data repositories that allow me to do this kind of thing in different ways.

What still needs work is how I give trusted services access to much more private data — things like my current location, my spending behavior, access to my friends and family, etc.

To date, most services follow the premise that the looser the controls, the more fluidly data will travel. And that’s all that mattered when it was still hard to get data flowing.

Data flow is no longer an issue. Perhaps data flow has actually become too easy now. And therein lies the problem.

Clearly, blogging, RSS and feed readers drove a lot of the early thinking about syndication. Blogging enabled people to post content in a publicly accessible data repository somewhere for anyone to pull out without any privacy or permissioning controls. The further your content then syndicated, the better.

Wikis and community sites like Slashdot created a slightly more complex read/write dynamic against the central content repository that lots of people could access together. The permissioning model was essentially hierarchical where controls were kept in the hands of a smaller community.

Then Flickr broke ground with a new approach. They applied a user-centric friends and family relationship model to permissioning access to personal photos. Flickr opened up what was once considered private data and defaulted it to a public read-only permission status. But each individual still has a great deal of control over the data he or she contributes.

Similarly, del.icio.us made it possible to store and publicly address what had previously been private data. The nice twist here was the easy-to-understand URLs that allowed machines to consume, interpret and redistribute data stored in del.icio.us.

Where services like Facebook and Wesabe are now breaking ground again is in identifying a security model around highly sensitive data. Contact lists are very personal, but there aren’t many data sets more personal than my purchases and spending patterns.

Neat things can happen when I give machines access to my data, both the things I explicitly ‘own’ and my implicit behaviors. I want machines to act on my behalf and make my data more useful to me in a range of different contexts.

For example, I like the fact that Facebook slurps up my Twitter activity and shares it with my friends in the Facebook network. I don’t want to change my ’status’ on every service that shows status messages. Similarly, I like that Last.fm captures my listening behavior from iTunes and then uses that data to give back personal recommendations on a badge posted to my blog.

Allowing machines to automatically act on personal data on my bahalf is the right direction for things to go. But important questions need to be resolved.

For example, what happens to my data in all the places I’ve allowed it to appear when I change it? How do permissions pass from one service to another? How do I guarantee that a permission type I grant in one service means the same thing in another service? How do changes propagate? How does consent get revoked?

And even trickier than all that will be the methods for enforcing protection of privacy and penalties for breaking those permissions.

Until trust is measurable with explicit consentual triggers, loosely coupled networks that act on the data I wish to protect are going to struggle to talk to each other. Standards need to enable common sharing tactics. Responsibility needs to be clearly defined. And policies need to be enforceable.

Empowering a person to invest in storing and sharing the more sensitive data he or she owns is going to require a lot more than traditional read/write controls. But given the pace of change right now I suspect the answers will happen as the people behind these services work things out together before the industry taskforces, legal entities and blogosphere sort it out for them.