Category Archives: aggregation

Orchestrating streams of data from across the Internet

The liveblog was a revelation for us at the Guardian. The sports desk had been doing them for years experimenting with different styles, methods and tone. And then about 3 years ago the news desk started using them liberally to great effect.

I think it was Matt Wells who suggested that perhaps the liveblog was *the* network-native format for news. I think that’s nearly right…though it’s less the ‘format’ of a liveblog than the activity powering the page that demonstrates where news editing in a networked world is going.

It’s about orchestrating the streams of data flowing across the Internet into a compelling use in one form or another. One way to render that data is the liveblog. Another is a map with placemarks. Another is a RSS feed. A stream of tweets. Storify. Etc.

I’m not talking about Big Data for news. There is certainly a very hairy challenge in big data investigations and intelligent data visualizations to give meaning to complex statistics and databases. But this is different.

I’m talking about telling stories by playing DJ to the beat of human observation pumping across the network.

We’re working on one such experiment with a location-tagging tool we call FeedWax. It creates location-aware streams of data for you by looking across various media sources including Twitter, Instagram, YouTube, Google News, Daylife, etc.

The idea with FeedWax is to unify various types of data through shared contexts, beginning with location. These sources may only have a keyword to join them up or perhaps nothing at all, but when you add location they may begin sharing important meaning and relevance. The context of space and time is natural connective tissue, particularly when the words people use to describe something may vary.

We’ve been conducting experiments in orchestrated stream-based and map-based storytelling on n0tice for a while now. When you start crafting the inputs with tools like FeedWax you have what feels like a more frictionless mechanism for steering the flood of data that comes across Twitter, Instagram, Flickr, etc. into something interesting.

For example, when the space shuttle Endeavour flew its last flight and subsequently labored through the streets of LA there was no shortage of coverage from on-the-ground citizen reporters. I’d bet not one of them considered themselves a citizen reporter. They were just trying to get a photo of this awesome sight and share it, perhaps getting some acknowledgement in the process.

You can see the stream of images and tweets here: http://n0tice.com/search?q=endeavor+OR+endeavour. And you can see them all plotted on a map here: http://goo.gl/maps/osh8T.

Interestingly, the location of the photos gives you a very clear picture of the flight path. This is crowdmapping without requiring that anyone do anything they wouldn’t already do. It’s orchestrating streams that already exist.

This behavior isn’t exclusive to on-the-ground reporting. I’ve got a list of similar types of activities in a blog post here which includes task-based reporting like the search for computer scientist Jim Gray, the use of Ushahidi during the Haiti earthquake, the Guardian’s MPs Expenses project, etc. It’s also interesting to see how people like Jon Udell approach this problem with other data streams out there such as event and venue calendars.

Sometimes people refer to the art of code and code-as-art. What I see in my mind when I hear people say that is a giant global canvas in the form of a connected network, rivers of different colored paints in the form of data streams, and a range of paint brushes and paint strokes in the form of software and hardware.

The savvy editors in today’s world are learning from and working with these artists, using their tools and techniques to tease out the right mix of streams to tell stories that people care about. There’s no lack of material or tools to work with. Becoming network-native sometimes just means looking at the world through a different lens.

Dispatchorama: a distributed approach to covering a distributed news event

We’ve had a sort of Hack Week at the Guardian, or “Discovery Week“. So, I took the opportunity to mess around with the n0tice API to test out some ideas about distributed reporting.

This is what it became (best if opened in a mobile web browser):

http://dispatchorama.com/



It’s a little web app that looks at your location and then helps you to quickly get to the scene of whatever nearby news events are happening right now.

The content is primarily coming from n0tice at the moment, but I’ve added some tweets with location data. I’ve looked at some geoRSS feeds, but I haven’t tackled that, yet. It should also include only things from the last 24 hours. Adding more feeds and tuning the timing will help it feel more ‘live’.

The concept here is another way of thinking about the impact of the binding effect of the digital and physical worlds. Being able to understand the signals coming out of networked media is increasingly important. By using the context that travels with bits of information to inform your physical reality you can be quicker to respond, more insightful about what’s going on and proactive in your participation, as a result.

I’m applying that idea to distributed news events here, things that might be happening in many places at once or a news event that is moving around.

In many ways, this little experiment is a response to the amazing effort of the Guardian’s Paul Lewis and several other brave reporters covering last year’s UK riots.

There were 2 surprises in doing this:

  1. The twitter location-based tweets are really all over the place and not helpful. You really have to narrow your source list to known twitter accounts to get anything good, but that kind of defeats the purpose.
  2. I haven’t done a ton of research, yet, but there seems to be a real lack of useful geoRSS feeds out there. What happened? Did the failure of RSS readers kill the geoRSS movement? What a shame. That needs to change.

The app uses the n0tice API, JQuery Mobile, and Google’s location APIs and a few snippets picked off StackOverflow. It’s on GitHub here:
https://github.com/mattmcalister/dispatchorama/

Local news is going the wrong way

Google’s new Local News offering misses the point entirely.

As Chris Tolles points out, Topix.net and others have been doing exactly this for years. Agregating information at the hyperlocal level isn’t just about geotagging information sources. Chris explains why they added forums:

“…there wasn’t enough coverage by the mainstream or the blogosphere…the real opportunity was to become a place for people to publish commentary and stories.”

He shouldn’t worry about Google, though. He should worry more about startups like Outside.in who upped the ante by adding a slightly more social and definitely more organic experience to the idea of aggregating local information.

Yet information aggregation still only dances around the real issue.

People want to know what and who are around them right now.

The first service that really nails how we identify and surface the things that matter to us when and where we want to know about them is going to break ground in a way we’ve never seen before on the Internet.

We’re getting closer and closer to being able to connect the 4 W’s: Who, What, Where and When. But those things aren’t yet connecting to expose value to people.

I think a lot of people are still too focused on how to aggregate and present data to people. They expect people to do the work of knowing what they’re looking for, diving into a web page to find it and then consuming what they’ve worked to find.

There’s a better way. When services start mixing and syndicating useful data from the 4 W vectors then we’ll start seeing information come to people instead.

And there’s no doubt that big money will flow with it.

Dave Winer intuitively noted, “Advertising will get more and more targeted until it disappears, because perfectly targeted advertising is just information. And that’s good!”

I like that vision, but there’s more to it.

When someone connects the way information surfaces for people and the transactions that become possible as a result, a big new world is going to emerge.

Why Outside.in may have the local solution

The recent blog frenzy over hyperlocal media inspired me to have a look at Outside.in again.


It’s not just the high profile backers and the intense competitive set that make Outside.in worth a second look. There’s something very compelling in the way they are connecting data that seems like it matters.

My initial thought when it launched was that this idea had been done before too many times already. Topix.net appeared to be a dominant player in the local news space, not to mention similar but different kinds of local efforts at startups like Yelp and amongst all the big dotcoms.

And even from their strong position, Topix’s location-based news media aggregaton model was kind of, I don’t know, uninteresting. I’m not impressed with local media coverage these days, in general, so why would an aggregator of mediocre coverage be any more interesting than what I discover through my RSS reader?

But I think Outside.in starts to give some insight into how local media could be done right…how it could be more interesting and, more importantly, useful.

The light triggered for me when I read Jon Udell’s post on “the data finds the data”. He explains how data can be a vector through which otherwise unrelated people meet eachother, a theme that continues to resonate for me.

Media brands have traditionally been good at connecting the masses to eachother and to marketers. But the expectation of how directly people feel connected to other individuals by the media they share has changed.

Whereas the brand once provided a vector for connections, data has become the vehicle for people to meet people now. Zip code, for example, enables people to find people. So does marital status, date and time, school, music taste, work history. There are tons of data points that enable direct human-to-human discovery and interaction in ways that media brands could only accomplish in abstract ways in the past.

URLs can enable connections, too. Jon goes on to explain:

“On June 17 I bookmarked this item from Mike Caulfield… On June 19 I noticed that Jim Groom had responded to Mike’s post. Ten days later I noticed that Mike had become Jim’s new favorite blogger.

I don’t know whether Jim subscribes to my bookmark feed or not, but if he does, that would be the likely vector for this nice bit of manufactured serendipity. I’d been wanting to introduce Mike at KSC to Jim (and his innovative team) at UMW. It would be delightful to have accomplished that introduction by simply publishing a bookmark.”

Now, Outside.in allows me to post URLs much like one would do in Newsvine or Digg any number of other collaborative citizen media services. But Outside.in leverages the zip code data point as the topical vector rather than a set of predetermined one-size-fits-all categories. It then allows miscellaneous tagging to be the subservient navigational pivot.

Suddenly, I feel like I can have a real impact on the site if I submit something. If there’s anything near a critical mass of people in the 94107 zip code on Outside.in then it’s likely my neighbors will be influenced by my posts.

Fred Wilson of Union Square Ventures explains:

“They’ve built a platform that placebloggers can submit their content to. Their platform “tags” that content with a geocode — an address, zip code, or city — and that renders a new page for every location that has tagged content. If you visit outside.in/10010, you’ll find out what’s going on in the neigborhood around Union Square Ventures. If you visit outside.in/back_bay, you’ll see what’s going on in Boston’s Back Bay neighborhood.”

Again, the local online media model isn’t new. In fact, it’s old. CitySearch in the US and UpMyStreet in the UK proved years ago that a market does in fact exist in local media somehwere somehow, but the market always feels fragile and susceptible to ghost town syndrome.

Umair Haque explains why local is so hard:

“Why doesn’t Craigslist choose small towns? Because there isn’t enough liquidity in the market. Let me put that another way. In cities, there are enough buyers and sellers to make markets work – whether of used stuff, new stuff, events, etc, etc.

In smaller towns, there just isn’t enough supply or demand.”

If they commit to building essentially micro media brands based exclusively on location I suspect Outside.in will run itself into the ground spending money to establish critical mass in every neighborhood around the world.

Now that they have a nice micro media approach that seems to work they may need to start thinking about macro media. In order to reach the deep dark corners of the physical grid, they should connect people in larger contexts, too. Here’s an example of what I mean…

I’m remodeling the Potrero Hill shack we call a house right now. It’s all I talk about outside of work, actually. And I need to understand things like how to design a kitchen, ways to work through building permits, and who can supply materials and services locally for this job.

There must be kitchen design experts around the world I can learn from. Equally, I’m sure there is a guy around the corner from me who can give me some tips on local services. Will Architectural Digest or Home & Garden connect me to these different people? No. Will The San Francisco Chronicle connect us? No.

Craigslist won’t even connect us, because that site is so much about the transaction.

I need help both from people who can connect on my interest vector in addition to the more local geographic vector. Without fluid connections on both vectors, I’m no better off than I was with my handy RSS reader and my favorite search engine.

Looking at how they’ve decided to structure their data, it seems Outside.in could pull this off and connect my global affinities with my local activities pretty easily.

This post is way too long already (sorry), but it’s worth pointing out some of the other interesting things they’re doing if you care to read on.

Outside.in is also building automatic semantic links with the contributors’ own blogs. By including my zip code in a blog post, Outside.in automatically drinks up that post and adds it into the pool. They even re-tag my post with the correct geodata and offer GeoRSS feeds back out to the world.

Here are the instructions:

“Any piece of content that is tagged with a zip code will be assigned to the corresponding area within outside.in’s system. You can include the zip code as either a tag or a category, depending on your blogging platform.”

I love this.

30Boxes does something similar where I can tell it to collect my Upcoming data, and it automatically imports events as I tag them in Upcoming.

They are also recognizing local contributors and shining light on them with prominant links. I can see who the key bloggers are in my area and perhaps even get a sense of which ones matter, not just who posts the most. I’m guessing they will apply the “people who like this contributor also like this contributor” type of logic to personalize the experience for visitors at some point.

Now what gets me really excited is to think about the ad model that could happen in this environment of machine-driven semantic relationships.

If they can identify relevant blog posts from local contributors, then I’m sure they could identify local coupons from good sources of coupon feeds.

Let’s say I’m the national Ace Hardware marketing guy, and I publish a feed of coupons. I might be able to empower all my local Ace franchises and affiliates to publish their own coupons for their own areas and get highly relevant distribution on Outside.in. Or I could also run a national coupon feed with zip code tags cooked into each item.

To Umair’s point, that kind of marketing will only pay off in major metros where the markets are stronger.

To help address the inventory problem, Outside.in could then offer to sell ad inventory on their contributors’ web sites. As an Outside.in contributor, I would happily run Center Hardware coupons, my local Ace affiliate, on my blog posts that talk about my remodelling project if someone gave them to me in some automated way.

If they do something like this then they will be able to serve both the major metros and the smaller hot spots that you can never predict will grow. Plus, the incentives for the individuals in the smaller communities start feeding the wider ecosystem that lives on the Outside.in platform.

Outside.in would be pushing leverage out to the edge both in terms of participation as they already do and in terms of revenue generation, a fantastic combination of forces that few media companies have figured out, yet.

I realize there are lots of ‘what ifs’ in this assessment. The company has a lot of work to do before they breakthrough, and none of it is easy. The good news for them is that they have something pretty solid that works today despite a crowded market.

Regardless, knowing Fred Wilson, Esther Dyson, John Seely Brown and Steven Berlin Johnson are behind it, among others, no doubt they are going to be one to watch.

Testing ways to splice my feeds

I started playing around with Pipes a bit more the other day and then found this handy tip via Lifehacker for nicer looking ways to link splice in your blog feed.


You can already splice del.icio.us and flickr directly into any Feedburner feed, but Pipes allows you to do things like isolating the saved bookmarks from tags and groups of tags. You can also prepend each item in your feed with things like “link”, “blog post”, and “photo”. You could also splice in other feeds that Feedburner doesn’t support like your Last.fm tracks, for example. I thought I would try offering foreign language versions of all this, too.

I apologize if my feed here gets squirrely on you as I work this out. Coincidentally, I saw this post yesterday that pointed out the number 1 reason people unsubscribe from a particular feed is information overload. I’m definitely becoming an overload offender here. Sorry.

If you want to be sure you’re only subscribed to my blog posts, then here is the blog-only feed.

UPDATE: As I suspected, it was a snap to create foreign language versions of my feeds. I’ve added several translations using the BabelFish operator. I can’t vouch for the accuracy or quality of the translations, but there are now Spanish, French, German, and Japanese language versions of my feed. More on the way.

A start page on my own domain

With a quick copy and paste job using Kent Brewster’s Pipes Badger and a few widgets from services I use, I now have what is a mostly sufficient start page on my own domain that displays my various forms of online expression. Really interesting stuff here.

A community site without a community

Taking a little time at home last week gave me a chance to play around with one of my experiments that was nearly at its end. FlipBait is a simple Pligg/MediaWiki site that pokes fun at the dotcom golddiggers out there.


It’s mostly a sandbox for me both technically and journalistically. But it’s not really helping to inform or build community the way I hoped.

First, after a month I still have no participants. There have been several passersby, but a group publishing site needs to have a core team looking after its well being.

Second, it’s just too much work in its current form for me to keep posting to it.

I sort of expected this to happen, but I’m a big fan of experimentation. So, I thought I might analyze the issues for a few blog posts and close it down…

…but then Pligg 9 was released.

The new version of this Digg-like CMS added a key feature that may alter the dynamics of the site completely: Feed Importing.

I give it a few RSS feeds. It then imports the headlines from those feeds automatically.

Now, I have a bunch of feeds all pouring headlines into FlipBait throughout the day. I’m aggregating the usual suspects like TechCrunch and GigaOM and VentureBeat, but I also found a few sources from various searches that effectively round out the breadth of the coverage

I can find new dotcom golddiggers without fail every day.

This is very cool. Though you can see back in the Pligg forum archives that there was some debate about whether this feature would destroy the whole dynamic of voting-based publishing. That may be true, but it’s just too useful not to have.

Now, this might be the most interesting part…

I’m also importing stories from del.icio.us using a new tag: “flipbait“. That means that if you tag an article with “flipbait”, Pligg will automatically import that article and make it available to the FlipBait community. That’s how I’m entering my own favorite posts for the site as opposed to using the ‘submit’ function directly at flipbait.com.

You don’t ever have to visit the domain, actually, because you can pull articles to read from the RSS feed and submit articles to the site just by tagging as you already do.

Hmmm…what does that mean? Interesting question. Can a meaningful community form around a word that represents an idea?

How to offer simple RSS badges for your users

The key breakthrough that made it possible for YouTube to ride on MySpace’s heavy traffic coattails into its current state as a mass media service is the concept of widgets, often called badges in related contexts. Although offering widgets or badges may seem like a far off idea for most web site owners to internalize yet, there are a few tools that can make this a snap to offer your users if you’re ready for it.

(I’ll assume here that you already know what widgets and badges are. If you don’t, I’ve been tagging articles addressing the topic of widgets that may be helpful.)

In the case of YouTube, they allowed users to post the YouTube video player to any web page with a simple copy and paste operation. Since most web site owners are dealing mostly with text, the equvilent would likely be a feed of RSS content that people could display on a web page. It would clearly be best to allow your users to display a feed of the things they are contributing to your web site, but if you don’t have user-contributed data to give back to your users it’s still worth trying to offer this functionality using your own content to see what happens.

Here’s a really cool tool I recently found that made it possible for me to offer badges to users on the FlipBait web site. It’s an open source service called Feed2JS, and it appears to be developed by Alan Levine. It requires another open source service called MagpieRSS to operate, but MagpieRSS takes maybe 10 minutes at most to download and install.

After you download and install these scripts you can point to a feed you want to display nicely and get the code back that you can include on any web site to show that feed.

In other words, you now have a badge platform to offer your uses.

I tried this out on the FlipBait web site, and it worked out of the gate. In fact, you can now see on my blog sidebar here the posts I’ve submitted to FlipBait. Each user on the site has access to his badge via his profile page. Now everyone can take their contributions with them wherever their “Internet startup news” identity gets expressed.

It couldn’t have been much easier to setup either. I’m hoping, actually, that the Pligg team incorporates something like this into the source code.

There are also some nice formatting capabilites in Feed2JS that would make people happy, I’m sure. But that adds some complexity I’ll address at a later date. The important thing is to push out a feature like this, watch for uptake, and then evolve it.

I’d be interested to know if other people have tried any other similar solutions or used tools from some of the recent startups in this space and what their experiences have been. Please comment or blog about it if you’ve found another way to accomplish this without having to write the code yourself.

A human-powered relevance engine for Internet startup news

Here’s a fun experiment in crowdsourcing. I’ve been getting overwhelmed by all the startup news coming out of the many sources tracking the interesting ideas and new companies hunting for Internet gold. Many of these companies are really smart. Many are just, well, gold diggers.


And with so many ways to track new and interesting companies, I’ve lost the ability to identify the difference between companies that are actually attacking a problem that matters and companies that are combining buzzwords in hopes of getting funding or getting acquired or both.

There must be a way to harness the collective insight of people who are close to these companies or the ideas they embody to shed light on what’s what. Maybe there’s a way to do that using Pligg.

While shaking my head in a moment of disappointment and a little bit of jealousy at all the new dotcom millionaires/billionaires, the word “flipbait” crossed my mind. I looked to see if the domain was available, and sure enough it was. So, I grabbed the domain, installed Pligg and there it is.

It should be obvious, but the idea is to let people post news of new Internet startups and let the community decide if something is important or not. If I’m not the only one thinking about this, then I can imagine it becoming a really useful resource for gaining insight into the barage of headlines filling up my feed reader each day.

And if it doesn’t work, I’ll share whatever insight I can glean into why the concept fails. There will hopefully at least be some lessons in this experiment for publishers looking to leverage crowdsourcing in their media mix.

Switching my default browser home page, again

I’ve probably changed my browser’s default home page about 10 times in the last year. Something about working here at Yahoo! has made me very picky about start pages.

The new Yahoo! home pageI most recently was using Netvibes which had a couple of really cool modules: a notes box that you could write in just by clicking in it and a sudoku puzzle that I would play on the train ride home. Unfortunately, Netvibes became way too slow for me. I found myself typing in a new URL before Netvibes came up every time I launched a browser window or clicked ‘home’.

I don’t think Netvibes is alone in learning the hard lesson of scaling personalization features. It’s clear that NewsAlloy is struggling under the weight of their usage, and Rojo recently rescued their ailing infrastructure, at least we hope, by adopting a new parent in MovableType.

Even more dramatic is the performance on Wizag. Wizag is one of the most promising start pages I’ve seen yet with its learning and categorization concepts. The design is awful and the speed is unusable, but those problems are easier to solve than developing really new and interesting algorithms. I’m hoping they figure these things out, because I would love to use it more.

Not too long ago I tried switching to Google’s Personalized page. I loved the integration with my phone. You can select modules from your personalized start page that will appear on the phone version. It’s really smart. And it made me try using Google Reader more. But Google Reader is just not the way I want to work with my feed sources, and I got too annoyed.

Why not use My Yahoo! as your browser home page, you ask? I use My Yahoo!, actually. At least weekly. But it shares a problem I have with all personalized start pages…I want my browser to open with something that I don’t know. I want it to lead me, sometimes just a little bit.

And I just learned when I switched to the new Yahoo! home page that I want big pictures, too.

The new Yahoo! home page is brilliant. It has everything I actually want just prior to starting a journey somewhere or even when I’m not sure where to start. I can see the most recent email messages without having to open the full email app. I can check out traffic in my neighborhood, send a quick IM, search and get to my feeds (on My Yahoo!) all from the same place with minimal effort.

But what I love most is that the Yahoo! home page shows me stuff that I don’t know. The top stories have huge impact. They’re inviting, and they make me want to click. And the pulse box always catches my attention with the Top 10 this and Top 10 that.

One of the proven rules in magazine cover selling at the newsstand is that people love top 10 lists. It’s true online, too.

We also learned at InfoWorld how powerful imagery can be when we studied people’s eye movements on a more image-driven home page. The results of that study are here.

No doubt, I’ll switch home pages again soon. I haven’t stuck with one page for more than a few months, but I also don’t remember being as pleased as I am with this page. The dust has settled from the launch earlier in the summer, and I have to agree with what most people in the industry said: The new Yahoo! home page rocks.