Socially linked data

The semantic web folks, including Sir Tim Berners-Lee, have been saying for years that the Internet could become significantly more compelling by cooking more intelligence into the way things link around the network.

The movement is getting some legs to it these days, but the solution doesn’t look quite like what the visionaries expected it to look like. It’s starting to look more human.

Photo: spcbrass
Photo: spcbrass

The more obvious journey toward a linked data world starts with releasing data publicly on the Internet.

Many startups have proven that opening data creates opportunity. And now the trend has turned into a movement within government in the US, the UK and many other countries.

Sir Tim Berners-Lee drove home this message at his 2009 TED talk where he got the audience to shout “Raw data now!”:

“Before you make a beautiful web site, first give us the unadulterated data. You have no idea the number excuses people come up with to hang on to their data and not give it to you even though you’ve paid for it as a taxpayer.”

Openness makes you more relevant. It creates opportunity. It’s a way into people’s hearts and minds. It’s empowering. It’s not hard to do. And once it starts happening it becomes apparent that it mustn’t and often can’t stop happening.

The forward-thinking investors and politicians even understand that openness is fuel for new economies in the future.

We held a sort of hack day type event at the Guardian for the Cabinet Office recently where the benefits to open data in government were catalyzed in the form of a postcode newspaper built together by Tom Taylor, Gavin Bell and Dan Catt:

Newspaper Club Postcode Paper

“It’s a prototype of a service for people moving into a new area. It gathers information about your area, such as local services, environmental information and crime statistics.”

Opening data is making government matter more to people. That’s great, but it’s just the beginning.

After openness, the next step is to work on making data discoverable. The basic unit for creating discoverability for content on a network is the link.

Now, the hyperlink of today simply says, “there’s a thing called X which you can find over there at address Y.

The linked data idea is basically to put more data in and around links to things in a specific structure that matches our language:

subject -> predicate -> object

Source: T.J. VanSlyke
Linked data by T.J. VanSlyke

This makes a lot of sense. Rather than derive meaning, explicit relationship data can eliminate vast amounts of noise around information that we care about.

However, there are other ways to add meaning into the network, too. We can also create and derive meaning across a network of linked data with short messages, as we’ve seen happening organically via Twitter.

What do we often write when we post to Twitter?

@friend said or saw or did this interesting thing over here http://website.com/blah

The subject is a link to a person. The predicate is the verb connecting the person and the object. And the object is a link to a document on the Internet.

Twitter is already a massive linked data cloud.

It’s not organized and structured like the links in HTML and the semantic triple format RDF. Rather it is verbose connectivity, a human-readable statement pointing to things and loosely defining what the links mean.

So, now it starts to look like we have some opposing philosophies around linked data. And neither is a good enough answer to Tim Berners-Lee’s vision.

Short messages lack standard ways of explicitly declaring meaning within links. They are often transient ideas that have no links at all. They create a ton of noise. Subjectivity rules. Short messages can’t identify or map to collections of specific data points within a data set. The variey of ways links are expressed is vast and unmanageable.

The semantic web vision seems like a far away place if its dependent on whether or not an individual happens to create a semantic link.

But a structural overhaul isn’t a much better answer. In many ways, RDF means we will have to rewrite the entire web to support the new standard. The standard is complicated. Trillions of links will have to obtain context that they don’t have today. Documents will compete for position within the linked data chain. We will forever be reidenitfying meaning in content as language changes and evolves. Big software will be required to create and manage links.

The issue isn’t about one model versus another. As people found with tags and taxonomies, the two are better when both exist together.

But there’s another approach to the linked data problem being pioneered by companies like MetaWeb who run an open data service called Freebase and Zemanta who analyze text and recommend related links.

The approach here sits comfortably in the middle and interoperates with the extremes. They focus on being completely clear about what a thing is and then helping to facilitate better links.

For example, Freebase has a single ID for everything. There is one ID and one URL that represents Abraham Lincoln:
http://www.freebase.com/view/en/abraham_lincoln

They know that Wikipedia, The New York Times and the Congressional Biography web sites who are all very authoritative on politicians have a single URL representing everything they each know about Abraham Lincoln, too.
abraham-lincoln

So, Freebase maintains a database (in addition to the web site that users can see) that links the authoritative Abraham Lincoln pages on the Internet together.

This network of data resources on Abraham Lincoln becomes richer and more powerful than any single resource about Abraham Lincoln. There is some duplication between each, but each resource is also unique. We know facts about his life, books that are written about him, how people were and still are connected to him, etc.

Of course, explicit relationships become more critical when the context of a word with multiple meanings enters the ecosystem. For example, consider Apple which is a computing company, a record company, a town, and a fruit.

Once the links in a network are known, then the real magic starts to happen when you mix in the social capabilities of the network.

Because of the relationships inherent in the links, new apps can be built that tell more interesting and relevant stories because they can aggregate data together that is connected.

You can imagine a whole world of forensic historians begging for more linked data. Researchers spend years mapping together events, geographic locations, relationships between people and other facts to understand the past. For example, a company called Six to Start has been working on using Google Maps for interactive historical fiction:

“The Six to Start team decided to literally “map” Cumming’s story, using the small annotation boxes for snippets of text and then illustrating movement of the main character with a blue line. As users click through bits of the story, the blue line traces the protagonist’s trajectory, and the result is a story that is at once text-based but includes a temporal dimension—we watch in real time as movement takes place—as well as an information dimension as the Google tool is, in a sense, hacked for storytelling.”

Similarly, we will eventually have a bridge of links into the physical world. This will happen with devices who have sensors that broadcast and receive short messages. OpenStreetMap will get closer and closer to providing a data-driven representation of the physical world, built collectively by people with GPS devices carefully uploading details of their neighborhoods. You can then imagine that games developers will make the real world itself into a gaming platform based on linked data.

We’ve gotten a taste of this kind of thing with Foursquare. “Foursquare gives you and your friends new ways of exploring your city. Earn points and unlock badges for discovering new things.

And there’s a fun photo sharing game called Noticin.gs. “Noticings are interesting things that you stumble across when out and about. You play Noticings by uploading your photos to Flickr, tagged with ‘noticings’ and geotagged with where they were taken.

It’s conceivable that all these forces and some creative engineers will eventually shrink time and space into a massive network of connected things.

But long before some quasi-Matrix-like world exists there will be many dotcom casualties who have benefitted from the existence of friction in finding information. When those challenges go away, so will the business models.

Search, for example, is an amazingly powerful and efficient middleman linking documents off the back of the old school hyperlink, but its utility may fade when the source of a piece of information can hear and respond directly to social signals asking for it somewhere in the world.

It’s all pointing to a frictionlessness information network, sometimes organized, sometimes totally chaotic.

It wasn’t long ago I worried the semantic web had already failed, but I’ve begun to wonder if in fact Tim Berners-Lee’s larger vision is going to happen just in a slightly different way than most people thought it would.

Now that linked data is happening on a more grassroots level in addition to the standards-driven approach I’m starting to believe that a world of linked data is actually possible if not closer than it might appear.

Again, his TED talk has some simple but important ideas that perhaps need to be revisited:

Paraphrasing: “Data is about our lives – a relationship with a friend, the name of a person in a photograph, the hotel I want to stay in on my holiday. Scientists study problems and collect vast amounts of data. They are understanding economies, disease and how the world works.

A lot of the knowledge of the human race is in databases sitting on computers. Linking documents has been fun, but linking data is going to be much bigger.”

Reblog this post [with Zemanta]

Local news is going the wrong way

Google’s new Local News offering misses the point entirely.

As Chris Tolles points out, Topix.net and others have been doing exactly this for years. Agregating information at the hyperlocal level isn’t just about geotagging information sources. Chris explains why they added forums:

“…there wasn’t enough coverage by the mainstream or the blogosphere…the real opportunity was to become a place for people to publish commentary and stories.”

He shouldn’t worry about Google, though. He should worry more about startups like Outside.in who upped the ante by adding a slightly more social and definitely more organic experience to the idea of aggregating local information.

Yet information aggregation still only dances around the real issue.

People want to know what and who are around them right now.

The first service that really nails how we identify and surface the things that matter to us when and where we want to know about them is going to break ground in a way we’ve never seen before on the Internet.

We’re getting closer and closer to being able to connect the 4 W’s: Who, What, Where and When. But those things aren’t yet connecting to expose value to people.

I think a lot of people are still too focused on how to aggregate and present data to people. They expect people to do the work of knowing what they’re looking for, diving into a web page to find it and then consuming what they’ve worked to find.

There’s a better way. When services start mixing and syndicating useful data from the 4 W vectors then we’ll start seeing information come to people instead.

And there’s no doubt that big money will flow with it.

Dave Winer intuitively noted, “Advertising will get more and more targeted until it disappears, because perfectly targeted advertising is just information. And that’s good!”

I like that vision, but there’s more to it.

When someone connects the way information surfaces for people and the transactions that become possible as a result, a big new world is going to emerge.

The useful convergence of data

I have only one prediction for 2008. I think we’re finally about to see the useful combination of the 4 W’s – Who, What, Where, and When.

Marc Davis has done some interesting research in this area at Yahoo!, and Bradley Horowitz articulated how he sees the future of this space unfolding in a BBC article in June ’07:

“We do a great job as a culture of “when”. Using GMT I can say this particular moment in time and we have a great consensus about what that means…We also do a very good job of “where” – with GPS we have latitude and longitude and can specify a precise location on the planet…The remaining two Ws – we are not doing a great job of.”

I’d argue that the social networks are now really honing in on “who”, and despite having few open standards for “what” data (other than UPC) there is no shortage of “what” data amongst all the “what” providers. Every product vendor has their own version of a product identifier or serial number (such as Amazon’s ASIN, for example).

We’ve seen a lot of online services solving problems in these areas either by isolating specific pieces of data or combining the data in specific ways. But nobody has yet integrated all 4 in a meaningful way.


Jeff Jarvis’ insightful post on social airlines starts to show how these concepts might form in all kinds of markets. When you’re traveling it makes a lot of sense to tap into “who” data to create compelling experiences that will benefit everyone:

  • At the simplest level, we could connect while in the air to set up shared cab rides once we land, saving passengers a fortune.
  • We can ask our fellow passengers who live in or frequently visit a destination for their recommendations for restaurants, things to do, ways to get around.
  • We can play games.
  • What if you chose to fly on one airline vs. another because you knew and liked the people better? What if the airline’s brand became its passengers?
  • Imagine if on this onboard social network, you could find people you want to meet – people in the same business going to the same conference, people of similar interests, future husbands and wives – and you can rendezvous in the lounge.
  • The airline can set up an auction marketplace for at least some of the seats: What’s it worth for you to fly to Berlin next Wednesday?

Carrying the theme to retail markets, you can imagine that you will walk into H&M and discover that one of your first-degree contacts recently bought the same shirt you were about to purchase. You buy a different one instead. Or people who usually buy the same hair conditioner as you at the Walgreen’s you’re in now are switching to a different hair conditioner this month. Though this wouldn’t help someone like me who has no hair to condition.

Similarly, you can imagine that marketing messages could actually become useful in addition to being relevant. If CostCo would tell me which of the products I often buy are on sale as I’m shopping, or which of the products I’m likely to need given what they know about how much I buy of what and when, then my loyalty there is going to shoot through the roof. They may even be able to identify that I’m likely buying milk elsewhere and give me a one-time coupon for CostCo milk.

Bradley sees it playing out on the phone, too:

“On my phone I see prices for a can of soup in my neighbourhood. It resolves not only that particular can of soup but knows who I am, where I am and where I live and helps me make an intelligent decision about whether or not it is a fair price.

It has to be transparent and it has to be easy because I am not going to invest a lot of effort or time to save 13 cents.”

It may be unrealistic to expect that this trend will explode in 2008, but I expect it to at least appear in a number of places and inspire future implementations as a result. What I’m sure we will see in 2008 is dramatic growth in the behind-the-scenes work that will make this happen, such as the development and customization of CRM-like systems.

Lots of companies have danced around these ideas for years, but I think the ideas and the technologies are finally ready to create something real, something very powerful.

Photo: SophieMuc

The business of network effects

The Internet platform business has some unique challenges. It’s very tempting to adopt known models to make sense of it, like the PC business, for example, and think of the Internet platform like an operating system.

The similarities are hard to deny, and who wouldn’t want to control the operating system of the Internet?

In 2005, Jason Kottke proposed a vision for the “WebOS” where users could control their experience with tools that leveraged a combination of local storage and a local server, networked services and rich clients.

“Applications developed for this hypothetical platform have some powerful advantages. Because they run in a Web browser, these applications are cross platform, just like Web apps such as Gmail, Basecamp, and Salesforce.com. You don’t need to be on a specific machine with a specific OS…you just need a browser + local Web server to access your favorite data and apps.”

Prior to that post, Nick Carr offered a view on the role of the browser that surely resonated with the OS perspective for the Internet:

“Forget the traditional user interface. The looming battle in the information technology business is over control of the utility interface…Control over the utility interface will provide an IT vendor with the kind of power that Microsoft has long held through its control of the PC user interface.”

He also responded later to Kottke’s vision saying that the reliance on local web and storage services on a user’s PC may be unnecessary:

“Your personal desktop, residing entirely on a distant server, will be easily accessible from any device wherever you go. Personal computing will have broken free of the personal computer.”

But the client layer is merely a piece of the much larger puzzle, in my opinon.

Dare Obasanjo more recently broke down the different ideas of what “Cloud OS” might mean:

“I think it is a good idea for people to have a clear idea of what they are talking about when they throw around terms like “cloud OS” or “cloud platform” so we don’t end up with another useless term like SOA which means a different thing to each person who talks about it. Below are the three main ideas people often identify as a “Web OS”, “cloud OS” or “cloud platform” and examples of companies executing on that vision.”

He defines them as follows:

  1. WIMP Desktop Environment Implemented as a Rich Internet Application (The YouOS Strategy)
  2. Platform for Building Web-based Applications (The Amazon Strategy)
  3. Web-based Applications and APIs for Integrating with Them (The Google Strategy)

The OS metaphor has lots of powerful implications for business models, as we’ve seen on the PC. The operating system in a PC controls all the connections from the application user experience through the filesystem down through the computer hardware itself out to the interaction with peripheral services. Being the omniscient hub makes the operating system a very effective taxman for every service in the stack. And from there, the revenue streams become very easy to enable and enforce.

But the OS metaphor implies a command-and-control dynamic that doesn’t really work in a global network controlled only by protocols.

Internet software and media businesses don’t have an equivilent choke point. There’s no single processor or function or service that controls the Internet experience. There’s no one technology or one company that owns distribution.

There are lots of stacks that do have choke points on the Internet. And there are choke points that have tremendous value and leverage. Some are built purely and intentionally on top of a distribution point such as the iPod on iTunes, for example.

But no single distribution center touches all the points in any stack. The Internet business is fundamentally made of data vectors, not operational stacks.

Jeremy Zawodny shed light on this concept for me using building construction analogies.

He noted that my building contractor doesn’t exclusively buy Makita or DeWalt or Ryobi tools, though some tools make more sense in bundles. He buys the tool that is best for the job and what he needs.

My contractor doesn’t employ plumbers, roofers and electricians himself. Rather he maintains a network of favorite providers who will serve different needs on different jobs.

He provides value to me as an experienced distribution and aggregation point, but I am not exclusively tied to using him for everything I want to do with my house, either.

Similarly, the Internet market is a network of services. The trick to understanding what the business model looks like is figuring out how to open and connect services in ways that add value to the business.

In a precient viewpoint from 2002 about the Internet platform business, Tim O’Reilly explained why a company that has a large and valuable data store should open it up to the wider network:

“If they don’t ride the horse in the direction it’s going, it will run away from them. The companies that “grasp the nettle firmly” (as my English mother likes to say) will reap the benefits of greater control over their future than those who simply wait for events to overtake them.

There are a number of ways for a company to get benefits out of providing data to remote programmers:

Revenue. The brute force approach imposes costs both on the company whose data is being spidered and on the company doing the spidering. A simple API that makes the operation faster and more efficient is worth money. What’s more, it opens up whole new markets. Amazon-powered library catalogs anyone?

Branding. A company that provides data to remote programmers can request branding as a condition of the service.

Platform lock in. As Microsoft has demonstrated time and time again, a platform strategy beats an application strategy every time. Once you become part of the platform that other applications rely on, you are a key part of the computing infrastructure, and very difficult to dislodge. The companies that knowingly take their data assets and make them indispensable to developers will cement their role as a key part of the computing infrastructure.

Goodwill. Especially in the fast-moving high-tech industry, the “coolness” factor can make a huge difference both in attracting customers and in attracting the best staff.”

That doesn’t clearly translate into traditional business models necessarily, but if you look at key business breakthroughs in the past, the picture today becomes more clear.

  1. The first breakthrough business model was based around page views. The domain created an Apple-like controlled container. Exposure to eyeballs was sold by the thousands per domain. All the software and content was owned and operated by the domain owner, except the user’s browser. All you needed was to get and keep eyeballs on your domain.
  2. The second breakthrough business model emerged out of innovations in distribution. By building a powerful distribution center and direct connections with the user experience, advertising could be sold both where people began their online experiences and at the various independent domain stacks where they landed. Inventory beget spending beget redistribution beget inventory…it started to look a lot like network effects as it matured.
  3. The third breakthrough business model seems to be a riff on its predecessors and looks less and less like an operating system. The next breakthrough is network effects.

Network EffectsNetwork effects happen when the value of the entire network increases with each node added to the network. The telephone is the classic example, where every telephone becomes more valuable with each new phone in the network.

This is in contrast to TVs which don’t care or even notice if more TVs plug in.

Recommendation engines are the ultimate network effect lubricator. The more people shop at Amazon, the better their recommendation engine gets…which, in turn, helps people buy more stuff at Amazon.

Network effects are built around unique and useful nodes with transparent and highly accessible connection points. Social networks are a good example because they use a person’s profile as a node and a person’s email address as a connection point.

Network effects can be built around other things like keyword-tagged URLs (del.icio.us), shared photos (flickr), songs played (last.fm), news items about locations (outside.in).

The contribution of each data point wherever that may happen makes the aggregate pool more valuable. And as long as there are obvious and open ways for those data points to talk to each other and other systems, then network effects are enabled.

Launching successful network effect businesses is no easy task. The value a participant can extract from the network must be higher than the cost of adding a node in the network. The network’s purpose and its output must be indespensible to the node creators.

Massively distributed network effects require some unique characteristics to form. Value not only has to build with each new node, but the value of each node needs to increase as it gets leveraged in other ways in the network.

For example, my email address has become an enabler around the Internet. Every site that requires a login is going to capture my email address. And as I build a relationship with those sites, my email address becomes increasingly important to me. Not only is having an email address adding value to the entire network of email addresses, but the value of my email address increases for me with each service that is able to leverage my investment in my email address.

Then the core services built around my email address start to increase in value, too.

For example, when I turned on my iPhone and discovered that my Yahoo! Address Book was automatically cooked right in without any manual importing, I suddenly realized that my Yahoo! Address Book has been a constant in my life ever since I got my first Yahoo! email address back in the ’90’s. I haven’t kept it current, but it has followed me from job to job in a way that Outlook has never been able to do.

My Yahoo! Address Book is becoming more and more valuable to me. And my iPhone is more compelling because of my investment in my email address and my address book.

Now, if the network was an operating system, there would be taxes to pay. Apple would have to pay a tax for accessing my address book, and I would have to pay a tax to keep my address book at Yahoo!. Nobody wins in that scenario.

User data needs to be open and accessible in meaningful ways, and revenue needs to be built as a result of the effects of having open data rather than as a margin-based cost-control business.

But Dare Obasanjo insightfully exposes the flaw in reducing openness around identity to individual control alone:

“One of the bitter truths about “Web 2.0” is that your data isn’t all that interesting, our data on the other hand is very interesting…A lot of “Web 2.0″ websites provide value to their users via wisdom of the crowds appproaches such as tagging or recommendations which are simply not possible with a single user’s data set or with a small set of users.”

Clearly, one of the most successful revenue-driving opportunities in the networked economy is advertising. It makes sense that it would be since so many of the most powerful network effects are built on people’s profiles and their relationships with other people. No wonder advertisers can’t spend enough money online to reach their targets.

It will be interesting to see how some of the clever startups leveraging network effects such as Wesabe think about advertising.

Wesabe have built network effects around people’s spending behavior. As you track your finances and pull in your personal banking data, Wesabe makes loose connections between your transactions and other people who have made similar transactions. Each new person and each new transaction creates more value in the aggregate pool. You then discover other people who have advice about spending in ways that are highly relevant to you.

I’ve been a fan of Netflix for a long time now, but when Wesabe showed me that lots of Netflix customers were switching to Blockbuster, I had to investigate and before long decided to switch, too. Wesabe knew to advise me based on my purchasing behavior which is a much stronger indicator of my interests than my reading behavior.

Advertisers should be drooling at the prospects of reaching people on Wesabe. No doubt Netflix should encourage their loyal subscribers to use Wesabe, too.

The many explicit clues about my interests I leave around the Internet — my listening behavior at last.fm, my information needs I express in del.icio.us, my address book relationships, my purchasing behavior in Wesabe — are all incredibly fruitful data points that advertisers want access to.

And with managed distribution, a powerful ad platform could form around these explicit behaviors that can be loosely connected everywhere I go.

Netflix could automatically find me while I’m reading a movie review on a friend’s blog or even at The New York Times and offer me a discount to re-subscribe. I’m sure they would love to pay lots of money for an ad that was so precisely targeted.

That blogger and The New York Times would be happy share revenue back to the ad platform provider who enabled such precise targeting that resulted in higher payouts overall.

And I might actually come back to Netflix if I saw that ad. Who knows, I might even start paying more attention to ads if they started to find me rather than interrupt me.

This is why the Internet looks less and less like an operating system to me. Network effects look different to me in the way people participate in them and extract value from them, the way data and technologies connect to them, and the way markets and revenue streams build off of them.

Operating systems are about command-and-control distribution points, whereas network effects are about joining vectors to create leverage.

I know little about the mathematical nuances of chaos theory, but it offers some relevant philosophical approaches to understanding what network effects are about. Wikipedia addresses how chaos theory affects organizational development:

“Most of the focus on chaos theory is primarily rooted in the underlying patterns found in an otherwise chaotic enviornment, more specifically, concepts such as self-organization, bifurcation and self-similarity…

Self-organization, as opposed to natural or social selection, is a dynamic change within the organization where system changes are made by recalculating, re-inventing and modifying its structure in order to adapt, survive, grow and develop. Self-organization is the result of re-invention and creative adaptation due to the introduction of, or being in a constant state of, perturbed equilibrium.”

Yes, my PC is often in a state of ‘perturbed equilibrium’ but not because it wants to be.

Why Outside.in may have the local solution

The recent blog frenzy over hyperlocal media inspired me to have a look at Outside.in again.


It’s not just the high profile backers and the intense competitive set that make Outside.in worth a second look. There’s something very compelling in the way they are connecting data that seems like it matters.

My initial thought when it launched was that this idea had been done before too many times already. Topix.net appeared to be a dominant player in the local news space, not to mention similar but different kinds of local efforts at startups like Yelp and amongst all the big dotcoms.

And even from their strong position, Topix’s location-based news media aggregaton model was kind of, I don’t know, uninteresting. I’m not impressed with local media coverage these days, in general, so why would an aggregator of mediocre coverage be any more interesting than what I discover through my RSS reader?

But I think Outside.in starts to give some insight into how local media could be done right…how it could be more interesting and, more importantly, useful.

The light triggered for me when I read Jon Udell’s post on “the data finds the data”. He explains how data can be a vector through which otherwise unrelated people meet eachother, a theme that continues to resonate for me.

Media brands have traditionally been good at connecting the masses to eachother and to marketers. But the expectation of how directly people feel connected to other individuals by the media they share has changed.

Whereas the brand once provided a vector for connections, data has become the vehicle for people to meet people now. Zip code, for example, enables people to find people. So does marital status, date and time, school, music taste, work history. There are tons of data points that enable direct human-to-human discovery and interaction in ways that media brands could only accomplish in abstract ways in the past.

URLs can enable connections, too. Jon goes on to explain:

“On June 17 I bookmarked this item from Mike Caulfield… On June 19 I noticed that Jim Groom had responded to Mike’s post. Ten days later I noticed that Mike had become Jim’s new favorite blogger.

I don’t know whether Jim subscribes to my bookmark feed or not, but if he does, that would be the likely vector for this nice bit of manufactured serendipity. I’d been wanting to introduce Mike at KSC to Jim (and his innovative team) at UMW. It would be delightful to have accomplished that introduction by simply publishing a bookmark.”

Now, Outside.in allows me to post URLs much like one would do in Newsvine or Digg any number of other collaborative citizen media services. But Outside.in leverages the zip code data point as the topical vector rather than a set of predetermined one-size-fits-all categories. It then allows miscellaneous tagging to be the subservient navigational pivot.

Suddenly, I feel like I can have a real impact on the site if I submit something. If there’s anything near a critical mass of people in the 94107 zip code on Outside.in then it’s likely my neighbors will be influenced by my posts.

Fred Wilson of Union Square Ventures explains:

“They’ve built a platform that placebloggers can submit their content to. Their platform “tags” that content with a geocode — an address, zip code, or city — and that renders a new page for every location that has tagged content. If you visit outside.in/10010, you’ll find out what’s going on in the neigborhood around Union Square Ventures. If you visit outside.in/back_bay, you’ll see what’s going on in Boston’s Back Bay neighborhood.”

Again, the local online media model isn’t new. In fact, it’s old. CitySearch in the US and UpMyStreet in the UK proved years ago that a market does in fact exist in local media somehwere somehow, but the market always feels fragile and susceptible to ghost town syndrome.

Umair Haque explains why local is so hard:

“Why doesn’t Craigslist choose small towns? Because there isn’t enough liquidity in the market. Let me put that another way. In cities, there are enough buyers and sellers to make markets work – whether of used stuff, new stuff, events, etc, etc.

In smaller towns, there just isn’t enough supply or demand.”

If they commit to building essentially micro media brands based exclusively on location I suspect Outside.in will run itself into the ground spending money to establish critical mass in every neighborhood around the world.

Now that they have a nice micro media approach that seems to work they may need to start thinking about macro media. In order to reach the deep dark corners of the physical grid, they should connect people in larger contexts, too. Here’s an example of what I mean…

I’m remodeling the Potrero Hill shack we call a house right now. It’s all I talk about outside of work, actually. And I need to understand things like how to design a kitchen, ways to work through building permits, and who can supply materials and services locally for this job.

There must be kitchen design experts around the world I can learn from. Equally, I’m sure there is a guy around the corner from me who can give me some tips on local services. Will Architectural Digest or Home & Garden connect me to these different people? No. Will The San Francisco Chronicle connect us? No.

Craigslist won’t even connect us, because that site is so much about the transaction.

I need help both from people who can connect on my interest vector in addition to the more local geographic vector. Without fluid connections on both vectors, I’m no better off than I was with my handy RSS reader and my favorite search engine.

Looking at how they’ve decided to structure their data, it seems Outside.in could pull this off and connect my global affinities with my local activities pretty easily.

This post is way too long already (sorry), but it’s worth pointing out some of the other interesting things they’re doing if you care to read on.

Outside.in is also building automatic semantic links with the contributors’ own blogs. By including my zip code in a blog post, Outside.in automatically drinks up that post and adds it into the pool. They even re-tag my post with the correct geodata and offer GeoRSS feeds back out to the world.

Here are the instructions:

“Any piece of content that is tagged with a zip code will be assigned to the corresponding area within outside.in’s system. You can include the zip code as either a tag or a category, depending on your blogging platform.”

I love this.

30Boxes does something similar where I can tell it to collect my Upcoming data, and it automatically imports events as I tag them in Upcoming.

They are also recognizing local contributors and shining light on them with prominant links. I can see who the key bloggers are in my area and perhaps even get a sense of which ones matter, not just who posts the most. I’m guessing they will apply the “people who like this contributor also like this contributor” type of logic to personalize the experience for visitors at some point.

Now what gets me really excited is to think about the ad model that could happen in this environment of machine-driven semantic relationships.

If they can identify relevant blog posts from local contributors, then I’m sure they could identify local coupons from good sources of coupon feeds.

Let’s say I’m the national Ace Hardware marketing guy, and I publish a feed of coupons. I might be able to empower all my local Ace franchises and affiliates to publish their own coupons for their own areas and get highly relevant distribution on Outside.in. Or I could also run a national coupon feed with zip code tags cooked into each item.

To Umair’s point, that kind of marketing will only pay off in major metros where the markets are stronger.

To help address the inventory problem, Outside.in could then offer to sell ad inventory on their contributors’ web sites. As an Outside.in contributor, I would happily run Center Hardware coupons, my local Ace affiliate, on my blog posts that talk about my remodelling project if someone gave them to me in some automated way.

If they do something like this then they will be able to serve both the major metros and the smaller hot spots that you can never predict will grow. Plus, the incentives for the individuals in the smaller communities start feeding the wider ecosystem that lives on the Outside.in platform.

Outside.in would be pushing leverage out to the edge both in terms of participation as they already do and in terms of revenue generation, a fantastic combination of forces that few media companies have figured out, yet.

I realize there are lots of ‘what ifs’ in this assessment. The company has a lot of work to do before they breakthrough, and none of it is easy. The good news for them is that they have something pretty solid that works today despite a crowded market.

Regardless, knowing Fred Wilson, Esther Dyson, John Seely Brown and Steven Berlin Johnson are behind it, among others, no doubt they are going to be one to watch.

Freebase.com is hot

I don’t get a chance to review products often enough these days. But when I heard about Freebase I knew I needed to dive into that one as soon as I was able.


Fortunately, I was invited only yesterday to take a peak. And I’m officially joining the hype wagon on this one.

Someone once described it as Wikipedia for structured data. I think that’s a good way to think about it.

That image leaves out one of the most powerful aspect of the tool, though. The pivot points that are created when a piece of data can be interlinked automatically and dynamically with other pieces of data creates a network of information that is more powerful than an edited page.

The Freebase screencast uses the movie database example to show this. You can dive in and out from actor to film which if you wanted could then carry on to topic to location to government to politician to gossip and on and on and on. And everything is editable.

Now, they didn’t stop at making the ultimate community-driven relational database. They exposed all the data in conveniently shareable formats like JSON. This means that I could build a web site that leverages that data and makes it available to my site visitors. I only need to link back to Freebase.com.

But that’s not all. In combination with the conveniently accessible data, they allow people to submit data to Freebase programmatically through their APIs. They will need to create some licensing controls for this to really work for data owners (NBA stats data and NYSE stock data, for example). But that’s getting easier to solve, and you can see that they are moving in that direction already.

Here’s a brief clip of the screencast which shows some other interesting concepts in action, too:

Suddenly, you can imagine that Freebase becomes a data clearinghouse, a place where people post information perhaps even indirectly through 3rd parties and make money or attract customers as others redistribute your data from the Freebase distribution point. They have a self-contained but infinitely scaleable data ecosystem.

I can imagine people wanting to manage their personal profile in this model and creating friends lists much like the typical social network except that it’s reusable everywhere on the Internet. I can imagine consumer goods producers weaving coupons and deals data with local retailer data and reaching buyers in highly relevant ways we haven’t seen yet.

Freebase feels very disruptive to me. I’m pretty sure that this is one to watch. And I’m not alone…

Michael Arrington: “Freebase looks to be what Google Base is not: open and useful.”

Jon Udell: “Freebase is aptly named, I am drawn like a moth to its flame.”

Tim O’Reilly: “Unlike the W3C approach to the semantic web, which starts with controlled ontologies, Metaweb adopts a folksonomy approach, in which people can add new categories (much like tags), in a messy sprawl of potentially overlapping assertions.”

John Markoff: “On the Web, there are few rules governing how information should be organized. But in the Metaweb database, to be named Freebase, information will be structured to make it possible for software programs to discern relationships and even meaning”

In some ways, it seems like the whole Web 2.0 era was merely an incubation period for breakthroughs like Freebase. Judging by the amount of data already submitted in the alpha phase, I suspect this is going to explode when it officially launches.