Using human editorial decisions to make a better algorithm

Machine learning tools can make people smarter. The thing that makes the magic happen is the data we feed it, the source of information the mathematics turns into insights.

It’s not just social platforms and retailers and cars that can benefit from machine learning. Anyone working in media can get smarter if they have the right tools at hand, too.

In the case of news orgs it’s the choices editors make implicitly and explicitly that provide the training data for the kind of machine that will help publishers make better decisions.

Kaleida has been tracking home page articles by leading publishers. CTO Graham Tackley has been developing systems for clustering similar stories together, capturing social media activity, monitoring how each publisher treats their stories, and rolling these and other inputs into a realtime picture of what matters right now in the media.

We have about 100k articles from the last few months and social signals and trending data for each one. Before even applying any kind of machine learning Graham has been discovering some surprising facts. Only about 5% of the articles promoted on publishers’ home pages earn over 2,000 engagements on Facebook. Articles about the US election perform equally well regardless of whether the headline is more about Trump or Clinton.

We take all this kind of information and run it through tools like IBM’s Watson APIs, Google’s entity extractor, the Aylien sentiment analysis API and Amazon’s Machine Learning web service, among others. We’ve been feeding it all into Elasticsearch which makes this much easier to do.

What have we learned?

Our initial research is designed to see what impact publishers’ editorial choices have on how well an article performs on social. So, we trained the algorithm to see those patterns first.

For example, it may seem obvious that promoting a story on your home page or on your branded social media page is a good idea, but machines can tell you just how much it matters. We can see that different words in the headline have a different effect for different publishers. Want to know what the ceiling looks like for a story assuming it doesn’t go viral? Want to know which topics out there have the most potential? Machines can answer all these questions, too.

Algorithms like this one can make predictions with surprising accuracy.

The machine predicted CNN’s piece “Mosul: Most intense day of fighting since offensive began” would earn 4,800 engagements on Facebook, and, in fact, it earned 4,500.

It was off in a few cases, too, of course. The machine accurately predicted Fox News’ story on “Millennials are clueless about socialism (call it the ‘Bernie Sanders effect’)” initially. It said it would earn 633 engagements. Just as the story appeared to die on Facebook at 611 engagements it took off again over the weekend and it now has double that. Most of the failures were lowball figures on stories that became very successful.

After this test we now have some ideas on things we can feed the machine to predict the potential for virality. But there are more interesting use cases than simply predicting the number of likes a story will get.

Algorithms can help identify better words to use in a headline or where to promote a story and for how long. It can provide guidance on more nuanced decisions, too, like who is the best writer to cover a topic or maybe what tone with which subjects will resonate with a particular publisher’s readers. It can probably decide whether a particular story helps convert readers into paying subscribers, too.

The trick is identifying the question people want answered using data people generate to get there normally. The machines just accelerate and amplify the little decisions we make often intuitively.

At worst algorithms can validate what publishers already know with hard evidence. At best they might help us all fix the media business.

Originally published at on October 23, 2016.