Open source grid computing takes off

This has been fun to watch. The Hadoop team at Yahoo! is moving quickly to push the technology to reach its potential. They’ve now adopted it on one of the most important applications in the entire business, Yahoo! Search.

From the the Hadoop Blog:

The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search.

Some Webmap size data:

  • Number of links between pages in the index: roughly 1 trillion links
  • Size of output: over 300 TB, compressed!
  • Number of cores used to run a single Map-Reduce job: over 10,000
  • Raw disk used in the production cluster: over 5 Petabytes

I’m still trying to figure out what all this means, to be honest, but Jeremy Zawodny helps to break it down. In this interview, he gets some answers from Arnab Bhattacharjee (manager of the Yahoo! Webmap Team) and Sameer Paranjpye (manager of our Hadoop development):

The Hadoop project is opening up a really interesting discussion around computing scale. A few years ago I never would have imagined that the open source world would be contributing software solutions like this to the market. I don’t know why I had that perception, really. Perhaps all the positioning by enterprise software companies to discredit open source software started to sink in.

As Jeremy said, “It’s not just an experiment or research project. There’s real money on the line.

For more background on what’s going on here, check out this article by Mark Chu-Carroll “Databases are hammers; MapReduce is a screwdriver”.

This story is going to get bigger, I’m certain.

How to offer simple RSS badges for your users

The key breakthrough that made it possible for YouTube to ride on MySpace’s heavy traffic coattails into its current state as a mass media service is the concept of widgets, often called badges in related contexts. Although offering widgets or badges may seem like a far off idea for most web site owners to internalize yet, there are a few tools that can make this a snap to offer your users if you’re ready for it.

(I’ll assume here that you already know what widgets and badges are. If you don’t, I’ve been tagging articles addressing the topic of widgets that may be helpful.)

In the case of YouTube, they allowed users to post the YouTube video player to any web page with a simple copy and paste operation. Since most web site owners are dealing mostly with text, the equvilent would likely be a feed of RSS content that people could display on a web page. It would clearly be best to allow your users to display a feed of the things they are contributing to your web site, but if you don’t have user-contributed data to give back to your users it’s still worth trying to offer this functionality using your own content to see what happens.

Here’s a really cool tool I recently found that made it possible for me to offer badges to users on the FlipBait web site. It’s an open source service called Feed2JS, and it appears to be developed by Alan Levine. It requires another open source service called MagpieRSS to operate, but MagpieRSS takes maybe 10 minutes at most to download and install.

After you download and install these scripts you can point to a feed you want to display nicely and get the code back that you can include on any web site to show that feed.

In other words, you now have a badge platform to offer your uses.

I tried this out on the FlipBait web site, and it worked out of the gate. In fact, you can now see on my blog sidebar here the posts I’ve submitted to FlipBait. Each user on the site has access to his badge via his profile page. Now everyone can take their contributions with them wherever their “Internet startup news” identity gets expressed.

It couldn’t have been much easier to setup either. I’m hoping, actually, that the Pligg team incorporates something like this into the source code.

There are also some nice formatting capabilites in Feed2JS that would make people happy, I’m sure. But that adds some complexity I’ll address at a later date. The important thing is to push out a feature like this, watch for uptake, and then evolve it.

I’d be interested to know if other people have tried any other similar solutions or used tools from some of the recent startups in this space and what their experiences have been. Please comment or blog about it if you’ve found another way to accomplish this without having to write the code yourself.

A human-powered relevance engine for Internet startup news

Here’s a fun experiment in crowdsourcing. I’ve been getting overwhelmed by all the startup news coming out of the many sources tracking the interesting ideas and new companies hunting for Internet gold. Many of these companies are really smart. Many are just, well, gold diggers.


And with so many ways to track new and interesting companies, I’ve lost the ability to identify the difference between companies that are actually attacking a problem that matters and companies that are combining buzzwords in hopes of getting funding or getting acquired or both.

There must be a way to harness the collective insight of people who are close to these companies or the ideas they embody to shed light on what’s what. Maybe there’s a way to do that using Pligg.

While shaking my head in a moment of disappointment and a little bit of jealousy at all the new dotcom millionaires/billionaires, the word “flipbait” crossed my mind. I looked to see if the domain was available, and sure enough it was. So, I grabbed the domain, installed Pligg and there it is.

It should be obvious, but the idea is to let people post news of new Internet startups and let the community decide if something is important or not. If I’m not the only one thinking about this, then I can imagine it becoming a really useful resource for gaining insight into the barage of headlines filling up my feed reader each day.

And if it doesn’t work, I’ll share whatever insight I can glean into why the concept fails. There will hopefully at least be some lessons in this experiment for publishers looking to leverage crowdsourcing in their media mix.

Scaffolding web sites with Ruby on Rails

I started messing around with Ruby on Rails for the first time on Sunday. This was after spending all day Saturday tearing down kitchen cupboards, tiled sinks and entire walls for a friend who is remodeling his house, so I got my fill of building last weekend whether real or virtual.


Photo: bruce grant

Trying to figure out how Ruby on Rails worked, I felt like I was remodeling my brain. It was as if I walked into Ikea with just a basic idea of what I wanted my new kitchen to look like and then walked out with design schematics and new appliances an hour later. I suddenly had confidence that I could create a really nice web site with a lot of functionality that was basically inaccessible to me before because of my limited programming background.

The “Ah hah!” moment came for me when I added two words to one of the scripts: “scaffold mydatabase”. When I refreshed my web site, I was adding, editing and deleting data in my database via a web interface. It all automatically just worked. Then literally 15 minutes later I had 2 databases talking to eachother.

It’s mindblowing how much power this environment gives to people who aren’t true coders.

I have a feeling I’ll get stuck and frustrated with what I’m trying to build. But I’m very hopeful Ruby on Rails will get me closer than I could with open source PHP tools. If nothing else, I’ll get a sense for this new trend.

Programming seems to have about a 3 year fashion cycle that also intersects with influxes of new ideas for web applications and a full cycle of students coming out of university. Now we’re at the early stages of a creative explosion on the Internet enabled by things like Rails, open APIs, storage solutions like S3, and JSON. And you can also wrap an idea in any number of different business models in even less time than it takes to build the product itself.

Maybe instead of LAMP (Linux, Apache, MySQL, PHP), we now have RASH (Rails, APIs, S3, Hosted).

There must be similar reactions to breakthroughs in the construction industry when things like cross-linked polyethylene (PEX) hit the market. Of course, construction suffers from bad naming as much as any other trade. Not everything can be as cool as a sawzall or funny pipe.

Open source software as a customer capture tool

I just started messing around with a product called SugarCRM which is an open source sales contact management tool much like Salesforce.com.

SugarCRMThey’ve done a really clever thing which is to build a revenue model around the added services rather than try to charge for the core software. You can download the same app that everyone else uses and install it yourself for free. But if you’re not up to the installation challenge, you can let them host it for you and get started in about 5 minutes for a $40/month usage fee. They charge more for additional services that larger groups may require.

I simultaneously started playing with the Salesforce.com free 30 day trial so that I could compare products. But I quickly realized that my learning curve for operationalizing any CRM system as part of my business was much more than 30 days. I also realized that I wanted the ability to do some major customizations which most likely need to happen at the code level. And I figured a Salesforce.com rep was going to call me and start selling to me which seemed like a fair price to pay but one I could actually do without. In fact, I got a call within 24 hours.

I wouldn’t have considered any of these issues as requirements except for the fact that they are available to me. I downloaded the SugarCRM software, installed it, configured it and uploaded a bunch of data in one afternoon. I now have a view into my customer pipeline that is going to simplify both strategic decisions as well as synthesize the variety of conversations happening across the business.

Now, I’m sure that Salesforce.com is more robust and that they have a lot of services and data integration methods I can’t get with SugarCRM. It must work better for larger organizations. I’m sure that Salesforce.com is more reliable, has fewer bugs, has more 3rd party developer tools, etc. At InfoWorld I learned that the cost of open source software becomes time and customization work which is sometimes more expensive than paying service fees (Aug. 2002, April 2006).

The San Francisco Chronicle noted this week that Salesforce.com is in a strong position with its model:

“A growing number of small businesses already realize that, despite recent problems, on-demand software makes more sense than setting up your own computer network.”

A sales manager at Salesforce.com informed me that he has never personally had to sell against SugarCRM in any of his calls. The market for it is probably pretty small.

However I just can’t help but I wonder if SugarCRM is in a position to do to Salesforce.com what Salesforce.com once did to Siebel, undercutting on price and extending efficiencies further out to the edge. The edge used to be self-serve style software as a service. SugarCRM went further and took the edge all the way out to the open source community.

A CRM app isn’t core to running my business. I can do what I need with spreadsheets. But if this tool makes my life easier or allows me to spread intelligence further or faster or if I’m able to make decisions I couldn’t otherwise visualize in my head, and I suspect it might, then I will definitely invest more heavily in it. At that point I may be calling on SugarCRM for additional services.

This software model also has the nice effect of helping me drive myself through the customer marketing funnel at my own pace. At the end of it, I won’t mind paying them for their services, and, in fact, I might be asking to pay for them. What sales person wouldn’t rather receive calls than make them?

Of course, the moment I need services that are worth paying for, I may need to switch to Salesforce.com. SugarCRM is banking on the possibility that I’ll reach user lock in before I get to that point. If I have to make that decision, then it means things are going well. And for that, SugarCRM will already have “loyalty” checked off in their column of my product comparison chart.