Socially linked data

The semantic web folks, including Sir Tim Berners-Lee, have been saying for years that the Internet could become significantly more compelling by cooking more intelligence into the way things link around the network.

The movement is getting some legs to it these days, but the solution doesn’t look quite like what the visionaries expected it to look like. It’s starting to look more human.

Photo: spcbrass
Photo: spcbrass

The more obvious journey toward a linked data world starts with releasing data publicly on the Internet.

Many startups have proven that opening data creates opportunity. And now the trend has turned into a movement within government in the US, the UK and many other countries.

Sir Tim Berners-Lee drove home this message at his 2009 TED talk where he got the audience to shout “Raw data now!”:

“Before you make a beautiful web site, first give us the unadulterated data. You have no idea the number excuses people come up with to hang on to their data and not give it to you even though you’ve paid for it as a taxpayer.”

Openness makes you more relevant. It creates opportunity. It’s a way into people’s hearts and minds. It’s empowering. It’s not hard to do. And once it starts happening it becomes apparent that it mustn’t and often can’t stop happening.

The forward-thinking investors and politicians even understand that openness is fuel for new economies in the future.

We held a sort of hack day type event at the Guardian for the Cabinet Office recently where the benefits to open data in government were catalyzed in the form of a postcode newspaper built together by Tom Taylor, Gavin Bell and Dan Catt:

Newspaper Club Postcode Paper

“It’s a prototype of a service for people moving into a new area. It gathers information about your area, such as local services, environmental information and crime statistics.”

Opening data is making government matter more to people. That’s great, but it’s just the beginning.

After openness, the next step is to work on making data discoverable. The basic unit for creating discoverability for content on a network is the link.

Now, the hyperlink of today simply says, “there’s a thing called X which you can find over there at address Y.

The linked data idea is basically to put more data in and around links to things in a specific structure that matches our language:

subject -> predicate -> object

Source: T.J. VanSlyke
Linked data by T.J. VanSlyke

This makes a lot of sense. Rather than derive meaning, explicit relationship data can eliminate vast amounts of noise around information that we care about.

However, there are other ways to add meaning into the network, too. We can also create and derive meaning across a network of linked data with short messages, as we’ve seen happening organically via Twitter.

What do we often write when we post to Twitter?

@friend said or saw or did this interesting thing over here

The subject is a link to a person. The predicate is the verb connecting the person and the object. And the object is a link to a document on the Internet.

Twitter is already a massive linked data cloud.

It’s not organized and structured like the links in HTML and the semantic triple format RDF. Rather it is verbose connectivity, a human-readable statement pointing to things and loosely defining what the links mean.

So, now it starts to look like we have some opposing philosophies around linked data. And neither is a good enough answer to Tim Berners-Lee’s vision.

Short messages lack standard ways of explicitly declaring meaning within links. They are often transient ideas that have no links at all. They create a ton of noise. Subjectivity rules. Short messages can’t identify or map to collections of specific data points within a data set. The variey of ways links are expressed is vast and unmanageable.

The semantic web vision seems like a far away place if its dependent on whether or not an individual happens to create a semantic link.

But a structural overhaul isn’t a much better answer. In many ways, RDF means we will have to rewrite the entire web to support the new standard. The standard is complicated. Trillions of links will have to obtain context that they don’t have today. Documents will compete for position within the linked data chain. We will forever be reidenitfying meaning in content as language changes and evolves. Big software will be required to create and manage links.

The issue isn’t about one model versus another. As people found with tags and taxonomies, the two are better when both exist together.

But there’s another approach to the linked data problem being pioneered by companies like MetaWeb who run an open data service called Freebase and Zemanta who analyze text and recommend related links.

The approach here sits comfortably in the middle and interoperates with the extremes. They focus on being completely clear about what a thing is and then helping to facilitate better links.

For example, Freebase has a single ID for everything. There is one ID and one URL that represents Abraham Lincoln:

They know that Wikipedia, The New York Times and the Congressional Biography web sites who are all very authoritative on politicians have a single URL representing everything they each know about Abraham Lincoln, too.

So, Freebase maintains a database (in addition to the web site that users can see) that links the authoritative Abraham Lincoln pages on the Internet together.

This network of data resources on Abraham Lincoln becomes richer and more powerful than any single resource about Abraham Lincoln. There is some duplication between each, but each resource is also unique. We know facts about his life, books that are written about him, how people were and still are connected to him, etc.

Of course, explicit relationships become more critical when the context of a word with multiple meanings enters the ecosystem. For example, consider Apple which is a computing company, a record company, a town, and a fruit.

Once the links in a network are known, then the real magic starts to happen when you mix in the social capabilities of the network.

Because of the relationships inherent in the links, new apps can be built that tell more interesting and relevant stories because they can aggregate data together that is connected.

You can imagine a whole world of forensic historians begging for more linked data. Researchers spend years mapping together events, geographic locations, relationships between people and other facts to understand the past. For example, a company called Six to Start has been working on using Google Maps for interactive historical fiction:

“The Six to Start team decided to literally “map” Cumming’s story, using the small annotation boxes for snippets of text and then illustrating movement of the main character with a blue line. As users click through bits of the story, the blue line traces the protagonist’s trajectory, and the result is a story that is at once text-based but includes a temporal dimension—we watch in real time as movement takes place—as well as an information dimension as the Google tool is, in a sense, hacked for storytelling.”

Similarly, we will eventually have a bridge of links into the physical world. This will happen with devices who have sensors that broadcast and receive short messages. OpenStreetMap will get closer and closer to providing a data-driven representation of the physical world, built collectively by people with GPS devices carefully uploading details of their neighborhoods. You can then imagine that games developers will make the real world itself into a gaming platform based on linked data.

We’ve gotten a taste of this kind of thing with Foursquare. “Foursquare gives you and your friends new ways of exploring your city. Earn points and unlock badges for discovering new things.

And there’s a fun photo sharing game called “Noticings are interesting things that you stumble across when out and about. You play Noticings by uploading your photos to Flickr, tagged with ‘noticings’ and geotagged with where they were taken.

It’s conceivable that all these forces and some creative engineers will eventually shrink time and space into a massive network of connected things.

But long before some quasi-Matrix-like world exists there will be many dotcom casualties who have benefitted from the existence of friction in finding information. When those challenges go away, so will the business models.

Search, for example, is an amazingly powerful and efficient middleman linking documents off the back of the old school hyperlink, but its utility may fade when the source of a piece of information can hear and respond directly to social signals asking for it somewhere in the world.

It’s all pointing to a frictionlessness information network, sometimes organized, sometimes totally chaotic.

It wasn’t long ago I worried the semantic web had already failed, but I’ve begun to wonder if in fact Tim Berners-Lee’s larger vision is going to happen just in a slightly different way than most people thought it would.

Now that linked data is happening on a more grassroots level in addition to the standards-driven approach I’m starting to believe that a world of linked data is actually possible if not closer than it might appear.

Again, his TED talk has some simple but important ideas that perhaps need to be revisited:

Paraphrasing: “Data is about our lives – a relationship with a friend, the name of a person in a photograph, the hotel I want to stay in on my holiday. Scientists study problems and collect vast amounts of data. They are understanding economies, disease and how the world works.

A lot of the knowledge of the human race is in databases sitting on computers. Linking documents has been fun, but linking data is going to be much bigger.”

Reblog this post [with Zemanta]

The Internet’s secret sauce: surfacing coincidence

What is it that makes my favorite online services so compelling? I’m talking about the whole family of services that includes Dopplr, Wesabe, Twitter, Flickr, and among others.

I find it interesting that people don’t generally refer to any of these as “web sites”. They are “services”.

I was fortunate enough to spend some time with Dopplr’s Matt Biddulph and Matt Jones last week while in London where they described the architecture of what they’ve built in terms of connected data keys. The job of Dopplr, Mr. Jones said, was to “surface coincidence”.

I think that term slipped out accidentally, but I love it. What does it mean to “surface coincidence”?

It starts by enabling people to manufacture the circumstances by which coincidence becomes at least meaningful if not actually useful. Or, as Jon Udell put it years ago now when comparing Internet data signals to cellular biology:

“It looks like serendipity, and in a way it is, but it’s manufactured serendipity.”

All these services allow me to manage fragments of my life without requiring burdensome tasks. They all let me take my data wherever I want. They all enhance my data by connecting it to more data. They all make my data relevant in the context of a larger community.

When my life fragments are managed by an intelligent service, then that service can make observations about my data on my behalf.

Dopplr can show me when a distant friend will be near and vice versa. Twitter can show me what my friends are doing right now. Wesabe can show me what others have learned about saving money at the places where I spend my money. Among many other things Flickr can show me how to look differently at the things I see when I take photos. And can show me things that my friends are reading every day.

There are many many behaviors both implicit and explicit that could be managed using this formula or what is starting to look like a successful formula, anyhow. Someone could capture, manage and enhance the things that I find funny, the things I hate, the things at home I’m trying to get rid of, the things I accomplished at work today, the political issues I support, etc.

But just collecting, managing and enhancing my life fragments isn’t enough. And I think what Matt Jones said is a really important part of how you make data come to life.

You can make information accessible and even fun. You can make the vast pool feel manageable and usable. You can make people feel connected.

And when you can create meaning in people’s lives, you create deep loyalty. That loyalty can be the foundation of larger businesses powered by advertising or subscriptions or affiliate networks or whatever.

The result of surfacing coincidence is a meaningful action. And those actions are where business value is created.

Wikipedia defines coincidence as follows:

“Coincidence is the noteworthy alignment of two or more events or circumstances without obvious causal connection.”

This is, of course, similar and related to the definition of serendipity:

“Serendipity is the effect by which one accidentally discovers something fortunate, especially while looking for something else entirely.”

You might say that this is a criteria against which any new online service should be measured. Though it’s probably so core to getting things right that every other consideration in building a new online service needs to support it.

It’s probably THE criteria.

Announcing baby with Twitter

I get Twitter now.

Announcing baby with TwitterUntil last week it seemed a bit silly to me, perhaps overhyped. But after using it to share updates of my son’s birth with friends and family members distributed across several time zones in near real-time, I’ve become a new fan of this fantastic tool.

Whereas I may have used email to announce his arrival before Twitter (something I also did after the fact), I was able to Twitter the experience of my son’s arrival throughout the day using my phone to simply send a little bit of info at a time via SMS.

Email would have been way too cumbersome for nearly live storytelling like this. Plus, the self-selective nature of it allowed some people to follow my posts who I probably wouldn’t have thought to email.

Flickr served a similar role for my daughter’s birth nearly 3 years ago, and it was invaluable to me again this time now that my mother and mother-in-law are both Flickr users finally. The photo-hungry grandparent is insatiable when it comes to newborns.

But Twitter adds a really nice new dimension to the way we share bits of our daily experience.

It was great knowing that my little brother in London and my older brother in Los Angeles were getting text messages on their phones as this major life event unfolded for me. Twitter made it feel like they were part of the experience, like bystanders, even if the details were as boring as where we ate dinner or what was on the TV in the hospital waiting room (Fresh Choice and Maury Povich, in case you’re interested).

Big sis checks out her new baby brotherSomehow I think the inability to share those inane details with the people we care about is exactly what makes people feel isolated in this modern distributed world. Well, maybe the world doesn’t need more meaningless data out there, but it certainly needs better ways to get the right data to the right people at the right time.

Twitter does just that.