Using human editorial decisions to make a better algorithm

Machine learning tools can make people smarter. The thing that makes the magic happen is the data we feed it, the source of information the mathematics turns into insights.

It’s not just social platforms and retailers and cars that can benefit from machine learning. Anyone working in media can get smarter if they have the right tools at hand, too.

In the case of news orgs it’s the choices editors make implicitly and explicitly that provide the training data for the kind of machine that will help publishers make better decisions.

Kaleida has been tracking home page articles by leading publishers. CTO Graham Tackley has been developing systems for clustering similar stories together, capturing social media activity, monitoring how each publisher treats their stories, and rolling these and other inputs into a realtime picture of what matters right now in the media.

We have about 100k articles from the last few months and social signals and trending data for each one. Before even applying any kind of machine learning Graham has been discovering some surprising facts. Only about 5% of the articles promoted on publishers’ home pages earn over 2,000 engagements on Facebook. Articles about the US election perform equally well regardless of whether the headline is more about Trump or Clinton.


We take all this kind of information and run it through tools like IBM’s Watson APIs, Google’s entity extractor, the Aylien sentiment analysis API and Amazon’s Machine Learning web service, among others. We’ve been feeding it all into Elasticsearch which makes this much easier to do.

What have we learned?

Our initial research is designed to see what impact publishers’ editorial choices have on how well an article performs on social. So, we trained the algorithm to see those patterns first.

For example, it may seem obvious that promoting a story on your home page or on your branded social media page is a good idea, but machines can tell you just how much it matters. We can see that different words in the headline have a different effect for different publishers. Want to know what the ceiling looks like for a story assuming it doesn’t go viral? Want to know which topics out there have the most potential? Machines can answer all these questions, too.

Algorithms like this one can make predictions with surprising accuracy.


The machine predicted CNN’s piece “Mosul: Most intense day of fighting since offensive began” would earn 4,800 engagements on Facebook, and, in fact, it earned 4,500.

It was off in a few cases, too, of course. The machine accurately predicted Fox News’ story on “Millennials are clueless about socialism (call it the ‘Bernie Sanders effect’)” initially. It said it would earn 633 engagements. Just as the story appeared to die on Facebook at 611 engagements it took off again over the weekend and it now has double that. Most of the failures were lowball figures on stories that became very successful.

After this test we now have some ideas on things we can feed the machine to predict the potential for virality. But there are more interesting use cases than simply predicting the number of likes a story will get.

Algorithms can help identify better words to use in a headline or where to promote a story and for how long. It can provide guidance on more nuanced decisions, too, like who is the best writer to cover a topic or maybe what tone with which subjects will resonate with a particular publisher’s readers. It can probably decide whether a particular story helps convert readers into paying subscribers, too.

The trick is identifying the question people want answered using data people generate to get there normally. The machines just accelerate and amplify the little decisions we make often intuitively.

At worst algorithms can validate what publishers already know with hard evidence. At best they might help us all fix the media business.


Originally published at www.kaleida.com on October 23, 2016.

Why publishers are turning to art for answers this week


The world has become so complex that even Stephen Hawking is unsure what’s going on. We can’t explain what we don’t understand, and when words elude us we turn to art for answers. Several publishers this week are tapping into the collective concern people are feeling.

Buzzfeed News is covering Inktober (artists from all over the world make one ink drawing a day for the entire month of October) and Ohio artist Shawn Coss who’s gorgeous but haunting images of mental health disorders are getting tremendous response.

On a lighter note The Guardian is talking about newly discovered artwork by Finnish writer and artist Tove Jansson who is best known for the Moomins.

The Internet has gone a bit bonkers over an odd looking terracotta sculpture of baby Jesus.

But that’s nothing compared to the #TrumpBookReport meme. Antonio French kicked it off in a tweet where he compared Trump’s foreign policies to badly written teenage book reports:

Trump’s foreign policy answers sound like a book report from a teenager who hasn’t read the book. “Oh, the grapes! They had so much wrath!”

— Antonio French (@AntonioFrench) October 20, 2016

Now, whether or not it counts as art I’m not sure, but we love this bike lock that emits a horrible smell that will make a thief vomit if they cut the lock.

SkunkLock was crowdfuned on IndieGogo

What an imaginative way to use technology to course correct without violence. I hope we find similar antidotes to the threats posed by AI that Hawking sees ahead of us.


Originally published at www.kaleida.com on October 21, 2016.

Who won the coverage of the US presidential debate?


After tracking three debates Kaleida can show patterns in the way leading publishers are covering them. It goes like this:

Step one: Write a “heads up” piece on the day, maybe the day before. Tell people what to expect and entice them to come back for your live coverage or follow on analysis. There are probably relevant news events or research studies, including polls that say one candidate has to ‘step up their game’ or something like that.

WINNER: NBC News, “‘Wall’ of Taco Trucks Line Up at Trump’s Hotel in Protest“

Step two: Do something live. Whether it’s video or a liveblog or whatever make sure that you are in the game and competing for position on Google News, ready to break the story of the debate as soon as it happens, whatever it is.

WINNER: CNN Live

Step three: Get the story. There are a handful of types of stories that can be written in the first few hours following the debate:

Step four: Hear what ‘the people’ have to say. You can do vox pops and interviews in places mentioned in the debate or with people demographically targeted in the candidates’ statements. There’s always a tweet that goes viral, so pick that up, too.

WINNER: These are still coming in

Step five: Amplify the key stories. Produce more analysis and thought pieces that either capture the mood following the debate or dive into the issues raised and what the candidates’ positions actually mean.

WINNER: The likely candidates are Trump’s refusal to commit to accepting the election result, saying he would deport ‘Bad Hombres’, or Clinton’s and Trump’s views about the Second Amendment. Though not directly related to the debate it does appear Trump’s children are increasingly drawing fire, too.


Originally published at www.kaleida.com on October 20, 2016.

Here’s what happens if you change the URL of a story that’s going viral on Facebook


On Sunday afternoon FoxNews.com reported that a Republican Party headquarters in North Carolina was firebombed the night before.

Kaleida showed that their story was moving really fast on Facebook earning over 40 engagements per minute. The article had 18,000 engagements and climbing and then, suddenly, it fell off a cliff.

Data from Kaleida.com

Engagements hit zero at 5:45 am GMT (12:45 am Eastern Time) and instead of climbing at 40 engagements per minute and reaching for 20k or more in total, the article started from zero again and earned about 2 or 3 per minute for the next several hours.

Now it has about 5,000 engagements in total.

What happened?

At some point during this article’s life Fox News changed the URL. When it launched Sunday afternoon (8:36pm GMT/3:35 Eastern) the URL was:

http://www.foxnews.com/politics/2016/10/16/north-carolina-gop-headquarters-firebombed.html

The URL probably changed at 12:09 am Eastern Time and became the following:

http://www.foxnews.com/politics/2016/10/17/north-carolina-gop-headquarters-firebombed.html

Fox News changed the day in the URL, presumably for enhanced positioning in Google News.

Interestingly, Facebook knows the original URL as you can see from their debugger tools. But it appears that when the ‘canonical’ URL was changed they must have zero’d the engagement count. It’s also interesting to note that Facebook still recognized the old URL for about 30 to 40 minutes before changing the way they were dealing with it.

Why did the momentum crash so suddenly? Just because the URL changed shouldn’t mean people would share it any less, right?

We have to assume that it was moving quickly because the story was hot and people were sharing it on Facebook a lot. But when it suddenly registered zero engagements the Facebook algorithm must have reprioritized other stories in front of it.

The new URL meant that Facebook thought it was a new article with no engagement instead of the highly active article that was flying across their network only moments before.

All those shares coming from within the world of Facebook, particularly URLs being viewed via the mobile browser or as instant articles, disappeared behind the news feed algorithm.

The lesson here is to be careful about changing URLs, particularly the <link rel=”canonical”> tag in your HTML. It seems Facebook will consider it a new page and rescan for engagement counts on the new page. All your engagements will be lost.

This is a case where what works in search may actually do damage in social.


Originally published at www.kaleida.com on October 17, 2016.

Analysis: Publishers have overcooked the Kim Kardashian robbery story

Coverage analysis by Kaleida

Late Sunday night reports emerged that Kim Kardashian was robbed in Paris. Publishers were quick to cover it, and they have been publishing related stories for 3 days now.

Is the effort paying off? Not so much.

Kaleida shows that among the 12 sources we are tracking at the moment about 70 articles have been published covering the Kim Kardashian robbery. In total these articles have earned about 150,000 engagements on Facebook or 2k per article (mean average).

The BBC, for example, has published 7 related articles earning a combined total of 30,000 engagements. Their most successful story has less than 10,000 engagements. Here’s how some of the publishers are performing, so far, ranked by efficiency:

Buzzfeed News  4 stories, 24k engagements. Top story: 10k Avg: 6k 
NYT 2 stories, 11k engagements. Top story: 7k Avg: 5.5k
BBC 7 stories, 30k engagements. Top story: 10k Avg: 4.3k
CNN 6 stories, 22k engagements. Top story: 11k Avg: 3.6k
NBC News 6 stories, 19k engagements. Top story: 14k Avg: 3.2k
The Guardian 6 stories, 8.5k engagements. Top story: 8k Avg: 1.4k
The Telegraph 10 stories, 7k engagements. Top story: 2k Avg: 700
Fox News 8 stories, 1.4k engagements. Top story 1k Avg: 175

Celebrity news can open opportunities to raise issues that are core to your brand as a publisher, though there seem to be few examples of that. Most if not all of these articles seem to be placed to drive traffic, rank high in Google News and find younger readers through social. It would seem to be a story practically designed for social news channels.

Unfortunately, the low engagement numbers relative to the output doesn’t seem to justify the resource.

Just to illustrate the point let’s use an average cost per article of $500 which could include both cost of production and total cost of delivery, and then let’s use that figure to see what social efficiency looks like.

               Total cost  Cost per engagement 
Buzzfeed News $2,000 $0.08
NYT $1,000 $0.09
BBC $3,500 $0.12
CNN $3,000 $0.13
NBC News $3,000 $0.16
The Guardian $3,000 $0.36
The Telegraph $5,000 $0.71
Fox News $4,000 $2.86


It’s likely Buzzfeed has a lower cost per article than The New York Times, so we can safely say they win in this case.

Fox News, on the other hand, seems to have performed particularly poorly compared to other leading publishers on this story. They published more stories than necessary, and they are getting very little lift on social for the effort.

There are some caveats to mention, not least of which is the fact that we’re only including stories in this analysis that publishers have promoted on their web site home pages. Some of these publishers may have had a lot more success on Facebook than indicated here. They may have had success on other platforms including Twitter and Snapchat. They may have had video views that made it all worthwhile. And perhaps they are drawing stronger visitor figures to their websites than this analysis implies.

But comparing similar coverage from similar publishers we can certainly derive a few lessons.

First, Kim Kardashian may not be much of a draw for national news media sources on the web.

Second, publishers can easily overspend on a hot story and fail to get much value out of it.

The second point is the important one.

Data-informed editorial decisions become increasingly important as the story in question falls further from core brand values for a publisher. Otherwise, valuable editorial resources are getting wasted

Journalism as an industry can’t afford to waste resources on coverage that doesn’t matter.


Originally published at www.kaleida.com on October 5, 2016.