Tagsonomy - Or How I Learned to Stop Worrying and Love The Semantic Web

I was asked to present some of the ideas behind our recent tagging efforts at this year's IDG online meeting in Boston. I didn't realize that I was actually presenting the Semantic Web until I finished my slides and finally woke up to what the Semantic Web really means. Below is a summary of the presentation, and you can view a reduced HTML version of the PowerPoint here. IDG staff can have the full version here.

Some of this is taken directly from Clay Shirky's "Ontology is Overrated" article.


We're all familiar with the classic hierarchy model for organizing our web sites. The tree has nodes and dependent subnodes. Many of us have tried to evolve the tree structure with a matrix model where subnodes can have relationships to other subnodes, and things can be associated with multiple nodes. This gets increasingly complex and doesn't necessarily add enough value to make it worth all the trouble.

The problem with these methods is that it requires a committee mentality to keep it functional, and committees often miss the objective of the organizing principal...to help our site visitors. Decisions take a long time to turn into action, and the structure becomes a disincentive for change.

A great example of hierarchical structure that fails is the Dewey Decimal System. This system is managed by people who spend all day every day thinking about how to organize books. One of the top nodes of the Dewey tree is Religion, and if you look into the subnodes, you see a problematic pattern. Countless classifications for Christianity dot the tree, including "Jesus Christ and His Family", "Christianity in Africa" and "Christian Art". There is one classification for Judaism and one classification for Islam. Buddhism would only fit under "Other Religions".

By definition, Taxonomy is the division of things into ordered groups. Semantics, on the other hand, is the study or science of meaning in language. I'm talking about moving to a model based on semantics here.

World Wide Web founder Sir Tim Berners Lee now leads an effort to define information using semantics.

"The Semantic Web is a project that intends to create a universal medium for information exchange by giving meaning, in a manner understandable by machines, to the content of documents on the Web."

Google Maps is a great example of this Semantic Web in action today. The company opened Google Maps so that developers can create tools that leverage the information within it and their powerful UI. For example, this developer created a commuter watch web site for the UK patching together a bunch of different data sources and layering them on top of Google Maps. You can drill into a particular area of London, for example, and identify where there might be train delays or motorway problems. You can then look at a live webcam of specific street intersections to assess how severe the traffic problem is. Then, you can locate the speed cameras along your planned journey to make sure you avoid paying a rush hour tax as you race through the city.

All of this works because of information that shares the same meaning, a meaning understood by computers. The connective tissue in this case is longitude and latitude. You can see in the BBC Travel XML feed that they have clearly identified every data point with the exact longitude and latitude where that condition applies. By connecting longitude and latitude data with various sources, such as BBC travel data and Google Maps, a developer can build very powerful applications and user interfaces.

So what does this mean for publishers? What is the connective tissue for articles?

This is where tagging becomes interesting. Jon Udell created a screencast recently demonstrating the powerful capability of the Google Maps open platform. Jon's "Annotating the Web" post became very popular, and the social bookmarking community began tagging that URL using del.icio.us. You can see what words were used on del.icio.us and rank them on a chart. "Google" and "maps" were the tags most commonly used, but then the list diversifies. People used "screencast", "xml", "hacks", "howto" and "travel" among others. None of these terms are words we have in our taxonomy. Had we tagged the post using our structure, we would have had to use "Application Development" or "Internet" which means nothing to the people who care about Jon's screencast. Different people identify the same thing in different ways...it's the blind men and the elephant story. No matter how clever our structure gets, we'll never make everyone see the elephant the same way.

Now, these ideas start to play out in a more tangible way when you integrate tagging into your every day publishing efforts. InfoWorld changed the way it identifies related links on article pages by pulling in related links based on tags rather than the taxonomy. You can see in this AJAX article that the related links are very specific to the topic. Now, if we had built the related links based on our taxonomy, you would see that the links would not only be unrelated to the article but that they would be unrelated to each other. Similar to the longitude and latitude example, things begin to connect to each other with tags.

Not surprisingly, the conversion rate improved dramatically when we switched our related links. For example, 12% of the visitors to this AJAX article clicked to another page compared to similar stories in the past which have converted at merely 4%. The impact of tagging will have a real monetary value to us over time if we can generate more page views this way.

Interesting ideas start forming when you dig through del.icio.us tags and uncover meaning that was perhaps unintended. They can help you find content that might be more relevant than Google search results, in fact, though the tool is not designed for search. I wanted to know more about this "AJAX" phenomenon and tried matching tags to find a nice list of links for researching the concept. I discovered a site called AJAX Matters which then had several interesting articles, code samples, links to books, and links to sites using AJAX.

The AJAX Matters site is one of those sites pretty far down the Long Tail, and it would seem there's no way for a media company like InfoWorld to cover that space with resources that are already pretty stretched. But this is where the Semantic Web comes in. By using tags as connective tissue, we could create a microportal for every topic in our market in an automated way, a system that generates pages on-the-fly based on the tags or tag combinations identified by the site visitor. We could resurface a lot of the valuable content that falls off the home page this way, too. And why not tag ads? Wouldn't it make sense to tag white papers, webcasts, and text links, too, in order to improve relevancy? We could even tag our internal promotions. If the conversion rate improves with more relevant content, then it's likely conversion rates will improve with more relevant advertising.

The Long Tail becomes more and more threatening the further up the slope you sit. And you start to panic about the next little guy who is going to figure out how to take market share away from you. Just watch CNet reacting to Engadget. This model, however, might enable us to expand beyond our little slice in the middle of the Long Tail and produce useful pages serving people with highly specific needs. It doesn't make the Long Tail go away, but it helps publishers accomplish some of the things that our legacy systems sometimes prevent us from doing well. It might make us look and feel like small publishers again, which can only be a welcome change for the people who value what we have to offer them.

I'm not suggesting that structure is unnecessary. In fact, it's very necessary for some of the broader categorization that makes our site useful. However, we spend way too much time thinking about a master structure and building complicated tools to accomplish complicated tasks that ultimately underserve the people we want to help.

I'm suggesting instead that in addition to a limited structure, we can build a system for connecting things. And if the world around us adopts the same connective tissue, we will be able to open up new possibilities for helping people find what they need. What seemed impossible when the Semantic Web was proposed is actually happening out there right now...with or without you.



Trackbacks:

TrackBack URL:
http://www.mattmcalister.com/blog/_trackback/957507

No trackbacks found.
Tagsonomy - Or How I Learned to Stop Worrying and Love The Semantic Web