Duplicate Content Myths Debunked

For many website owners, the duplicate content issue is a bit of a murky one. While most know it’s frowned upon by Google, few have a grasp of the details and even fewer are sure if there is even a penalty attached to it at all.

Here to clear the murky waters for good are the CleverClicks Myth Busters!
But before we let them loose on it, let’s just get clear on the basics…

What is duplicate content?

Duplicate content is blocks of content within or across domains that either completely match other content or are appreciably similar, to use Google’s own words. So essentially, it’s content which appears elsewhere on the web.

Internal vs. External

Duplicate Content comes in two forms:

Internal duplicate content: Duplicate content within one website or domain

External duplicate content: Duplicate content that exists between two or more different sites across the web.

Why should you care about it?

Well, if Google picks up that you have duplicate content (which it feels is malicious in intent) then your website and offending pages are in for a hard time.

Firstly, you won’t rank for the offending page/s where duplicate content is found.
Secondly, the weight of the page/s will be negligible.
Thirdly, a point against the site as a reliable source of quality, unique content will be registered.

These are, obviously, not ideal outcomes, but before you panic (thinking about those wikipedia definitions you copy-pasted in your last blog post), there is slightly more to this duplicate content ‘penalty’ than meets the eye.

Our mythbusters will take it from here….

Myth: Duplicate means having scraped content or the same text on multiple pages.

Truth: But wait, there’s more…

Pages accessible via multiple URLs will also register as duplicate content.

When the bots crawl your pages they’ll visit each individual URL and expect to find individual content. If this isn’t the case then it’s considered duplicate content.

This can be both internal (you have 2 URLs leading to the same page) or external (content is showing up in more than one location across the web).

If it’s legitimate and you’re just doing something like sharing an article (with permission) on your site, you can mark these pages with the rel=”canonical” tag, the URL parameter handling tool, or 301 redirects. However, if the content is internal the best way to deal with it to make sure that each piece of content has only one URL associated with it.

This not only dupe-proofs you, but it also makes it less confusing for users to navigate.

Myth: There is no such thing as a duplicate content penalty

Truth: Well, um…

This one is actually pretty close to the truth.

‘What?” I hear you cry. ‘Then why am I even reading this article??’

Because your rankings can be seriously affected by duplicate content. But, yes, the term ‘penalty’ isn’t ‘technically’ correct.

In reality it’s more of a filter, but it’s become known as a penalty because when you’re on the receiving end of it your rankings will all but disappear.

Ouchies.

So while it isn’t considered a full-blown penalty you’ll certainly feel as though you have been penalised.

Myth: Having disclaimers of information across multiple pages counts as duplicate content

Truth: Google have thought about this

Matt Cutts has said that having a Terms and Conditions template or a Disclaimer message across all pages of your site won’t get you penalised.

“If it’s required, I wouldn’t stress about that… Unless the content that you have is spammy or keyword-stuffed, then an algorithm or a person might take action.”

Myth: One should block crawlers’ access to duplicate pages

Truth: Don’t do it!

If multiple URLs point to the same content, but it’s not malicious in nature (republishing a blog post with permission, for example) it can still be flagged as external duplicate content, even though it’s perfectly permissible.

A misleading piece of advice given to webmasters in these cases is often to ‘block crawler access’, however, Google warns against doing this.

“If search engines can’t crawl pages with duplicate content, they can’t automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages.” Says Google.

A better solution is to allow search engines to crawl these URLs, but mark using the rel=”canonical” link element, the URL parameter handling tool, or 301 redirects.

This tells Google that you acknowledge the duplication and allows them to overlook it (provided they don’t find it malicious).

Myth: All syndicated content is duplicate content

Truth: Not necessarily

There are two types of syndication sites:

Type 1. Legitimate news sites and websites that share content which has already been published elsewhere. These sites have permission to re-publish this content and / or give credit to the original author.

Type 2: Sites that produce no original content and just scrape, steal and borrow text or images from other websites without giving credit.

Obviously, Type 2 is a disaster waiting to happen, but Google respects that Type 1 websites are offering a valuable service which can benefit both readers and the original authors of the content.

So, yes, while their site is technically full of external duplicate content, Google can judge the intent and give the good guys a break. It would be very hard for sites like Buzzfeed to function if they didn’t.

Myth: Translated copy isn’t duplicate content

Truth: Sometimes it is

You’d think that changing something into a different language would mean it isn’t duplicate content, but it can be picked up as such. Especially if your content has been directly translated (newsflash: language doesn’t work that way).

If you have a version of your site in a different language you’ll need to change the sentence structure, alter the content a bit and use a different regional domain, for example:

example.es
es.example.com
example.com/es

Myth: It’s easy to get ‘penalised’ for duplicate content

Truth: It only happens in extreme cases

It takes quite a lot to make the duplicate content alarm sound at Google. Most webmasters haven’t come across many cases where a site’s rankings dropped because of duplicate content alone.

As Google themselves say, “mostly, [duplicate content] is not deceptive in origin.”
Which means they’re able to identify it when they see it.
When Google looks for duplicate content it take the following into consideration:

Volume: How many duplicates of the same text exist. In most cases, it needs to be hundreds of pages before Google takes notice.
Timing: If all the hundreds of duplicate content appear at the same time you’re bound to raise a few eyebrows. If it happens gradually you’re less likely to gain any attention.
Context: If the duplicate copy is on a brand new domain or is from a high profile page such as the home page, then it looks fishy. If it’s a press release or a blog post from an established site which is being shared across the web, there’s less likely to be a fuss.

Generally, the only sites which incur a duplicate content penalty are ones which:

Have nothing but scraped or plagiarised content
Provide no accreditation or sources
Steals images, auto-translates pages, or uses dodgey automated tools to alter plagiarised content
Purposefully creates pages with nearly identical content (done in order to rank for locations/keywords)
Are bad quality and spammy in nature

That being said, you shouldn’t become too lax about it. Even though Google has come a long way in figuring out malicious vs benign duplicate content, if don’t keep your site structure dupe free and ensure your key landing pages are highly unique, you will struggle to rank well.

You may not drop off the face of the internet, but you won’t reach your full potential either. Something to keep in mind.

Hopefully the mists have lifted and left you looking at a clearer picture of what Duplicate Content is and when you’re at risk of being punished for it.

Do you feel slightly calmer about it? Good. That’s what our mythbusters are here for. Let us know if you have any other questions in the comments section below!

About Steph Von der Heyde

Our resident wordsmith’s love of digital lured her over from advertising to the online space, where she fell in love with content marketing. Since coming to the online world Steph has made her mark on all outgoing CleverClicks copy and is passionate about using words to build brands. Her obsession with the writing is rivaled only by her love of trail running, yoga and green juice. When she’s not submerged in content strategy you’ll find Steph in Downward Dog.

Duplicate Content Myths Debunked

What is duplicate content?

Internal vs. External

Why should you care about it?

About Steph Von der Heyde

Leave a reply
Cancel reply

Leave a reply

Duplicate Content Myths Debunked

What is duplicate content?

Internal vs. External

Why should you care about it?

About Steph Von der Heyde

Leave a reply Cancel reply

Leave a reply

Leave a reply
Cancel reply