bad news Blogging Ethics Journalism

Aggregation is Plagiarism

On March 15, 2011, I started the post you now read with a headline left unanswered: “Is Aggregation Really Just Plagiarism?” Clearly, my answer—too long coming—is “Yes”. Unequivocally, news aggregation is plain, pure plagiarism.

Google enables, no encourages, content thieves, despite recent search engine penalizing strategies. Too often, the big G raps sites because of links to black-listed blogs. The problem is bigger: Mainstream blogs writing synopsis stories that include absolutely no original reporting but take away pageviews from the news site doing the real work.

The worst-offending recaps are those that give away for free content that organizations like the New York Times must charge for. Aggregators seek high click volume, which can generate more advertising and improve page rates. To repeat: Google’s business model enables them.

As measure of popular perceptions, Nasim Pedrad, portraying Arianna Huffington on “Saturday Night Live”, remarks: “The New York Times has great coverage for this…and you can read all of it on the Huffington Post—because we copied it and pasted it”.

Last year I criticized Huffington Post for a splashy, fill-the-above-fold headline that demanded readers attention but reeked of stale news. The story offered no original reporting. It was mere aggregation pretending to be something more. At least this evening’s fill-the-fold story, “Report: Russia Threatened Nations Before the Vote” is a lengthy Reuters story.

Like Twins
Some people won’t agree that mere synopses qualify as plagiarism, because the one is not an exact copy of the other. Good journalism answers questions who, what, where, and when. Great journalism adds the why. If the aggregated post snips the meaning around the first four Ws, how is that not in the spirit of plagiarism?

For those who would argue no plagiarism, I have a fresh example—a tantalizing story romping across the web over the past 48 hours.

On March 27, BBC headline “Giant rat: Swedes agog at ‘Ratzilla’ in Stockholm” got me to click. The story and photos are quite remarkable. Today, Gawker tempted with catchier headline “Ratzilla, the 16-Inch ‘Rat From Hell’, Finally Captured in Sweden“. Taylor Berman’s lead and presentation is funny and original, or so I thought before clicking the link-through to The Local, which delivers “Sweden’s news in English”.

The stories’ similarities startle for their closeness. That’s my reading. I would like your judgement. Below I transpose the first two-thirds of the stories—blue for Gawker and red for The Local. From my reading, the American blog goes way over the line. This aggregated post is pure plagiarism.

Ratzilla, the big ass rat that terrorized a Swedish family for weeks, is finally dead.

A family in Solna, north of Stockholm, had no idea what was in store when their pet cat was too scared to go into the kitchen. “We thought it could be a little mouse, but after a while we figured it couldn’t be because it was making too much noise”, Signe Bengtsson told The Local.

Erik Korsas and his family first realized they had a problem when their pet cat refused to enter their kitchen. “We thought it could be a little mouse, but after a while we figured it couldn’t be because it was making too much noise”, Korsas’ wife, Signe Bengtsson, told The Local.

Her worst fears were confirmed while emptying the trash a few days later when she saw a rat guzzling leftovers under the sink. “It was right there in our rubbish bin, a mighty monster. I was petrified. I couldn’t believe such a big rat could exist”, she said. “I couldn’t help but do the old classic and jump on the kitchen table and scream”.

Several days later she spotted a giant rat eating from her garbage can. “It was right there in our rubbish bin, a mighty monster. I was petrified. I couldn’t believe such a big rat could exist”, she said. “I couldn’t help but do the old classic and jump on the kitchen table and scream”.

Her husband Erik Korsås who was away at the time was dubious that such an enormous rat could really be living in his kitchen. “When my wife called I said ‘Yeah, sure, take it easy, I’ll be home on Sunday. But by then it had jumped into the waste bin and had a Swedish smörgåsbord with all the leftovers”, he said.

She called her husband, who was away on a business trip. “When my wife called I said ‘Yeah, sure, take it easy, I’ll be home on Sunday. But by then it had jumped into the waste bin and had a Swedish smörgåsbord with all the leftovers”, he said.

As the rat made more appearances over the next day, the family took to stomping around as they passed the kitchen to ensure they wouldn’t meet the rat from hell again. “By the time I got home, the rat was so domesticated that it just sat under the kitchen table”, Korsås explained, adding that it had chewed through the water pipes connected to the dishwater and started a small flood.

For days, the family lived in horror, stomping loudly when they entered the kitchen to scare the hell rodent away. “By the time I got home, the rat was so domesticated that it just sat under the kitchen table”, Korsas said.

One Like the Other
Using a plagiarism-investigation service new to me, Copyscape, I compared the text of both stories, which you see in this PDF. Copyscape finds 57 percent match up of words/phrases in the Gawker post to The Local. The Swedish story posted a daily earlier.

Obviously, the big problems are the quotes, which are lifted wholesale without any additional citation, as if, regardless of the writer’s intentions, like he got them from the source. I sometimes quote press releases this way, but never someone else’s news story.

But the algorithm fails to capture other similarities that any reader should see. Story flow and structure are nearly identical, while Gawker’s paraphrasing is too much like the other.

This kind of aggregated copycatting is more common than this example, and if I get into further huff about this topic more comparisons will follow. I choose this one because the Gawker story so caught my attention initially and then the similarities The Local so stunned.

By comparison, the BBC story bears little resemblance to the Swedish news site’s story and offers additional information that demonstrates somebody reported rather than regurgitated.

I leave you with a question that might surprise because of my tone here expressed: Is aggregation-plagiarism always wrong? I’m not convinced. I believe there also is legitimate news case for aggregation, even when taking so much from the original as the Gawker story does.

I will explain why in a follow-up post, exploring how values about information have changed, the importance of serving audience, and how attitudes about originality and copying are antiquated in the contextual journalism era.

Photo Credit: Lucia Whittaker

7 Comments Add New Comment

  1. Thanks for the perspective. I’m really over the Huffington Post too, especially since original material seems to be scarce there these days. I’ve noticed the HP has been creating many articles from Reddit comments as well. And don’t get me started with Buzzfeed and the like, using images without permission. Where’s the professional integrity?

    1. Crazy thing about Buzzfeed is the attempt to go legit news site–not that I see it. Huffington Post goes without saying.

    2. Digital editors hang out in the Reddit threads. You’re absolutely right. If anyone has a hard time pitching a story, post it on Reddit and generate buzz yourself.

      As for HuffPo. That was a direct response to print media not going digital in a robust way.

  2. I tried copyscape. Compared “My dog is a cat. It eats, sleeps and plays.” vs “My cat is a dog. It plays, eats and sleeps.” It failed to detect any similarity between these two.

  3. I’d am tempted to say that aggregation without citation is plagiarism and what places like Gawker or HuffPo do is simply tasteless. In contrast places like the loop, DF, kottke etc. add value to aggregation because they clearly cite and sometimes add a line or two supporting or denying the premise of a story.

    The main problem is that the publishing business has become too strongly dependent on ad revenue and what would simply be a faux pas is hurting the bottom line of those doing the work.

    A part of the problem is the outlandish digital subscription prices that the publications demand and then still serve me with ads. This differentiates them negatively from an aggregator and allows aggregators to proliferate. If NYT say offered a multiple tiered access, I think it this would go a long way towards solving the problem.

    A tier I like makes content free to read and works on limitations on comments and ads:
    1) Free, can only read comments posted by others, high ad rate
    2) $4.99 a month, can read and vote on comments posted by others, medium ad rate
    3) $14.99 a month, can read and vote on others comments, post own comments, low ad rate.
    4) $49.99 a month, can read and vote on others comments, post own comments, ad free.

    The digital world starkly makes clear the difference between information and knowledge. The former is just data, the latter is perspective that is formed by conversations. Give the data away, charge for the perspective.

  4. You make excellent recommendations.

    The publishing business wants to be dependent on ad revenue. But there is too much content and not enough advertising to fill the ad space. That exacerbates these bad practices.

    I agree. If you pay, you shouldn’t have to view advertising.

Leave a Reply