Some less-experienced search engine optimization experts believe that there is no penalty for having duplicate content. However, many exceptions to this rule have been discovered. Duplicate content penalties exist for certain scenarios that are discussed in this article.
The conventional wisdom regarding duplicate content is as follows:
- Duplicate content can occur within a site or across different sites.
- A page can be considered duplicate without being identical.
- Search engines want to publish only one version of particular content in their index. (The reason behind this is simple, if a person goes to a website with results A, finds out that these results are not what he/she was looking for, goes back to the search engine to the next site listed, if the results are the same as A, then the second content is of no help to the person and this is a bad user experience.)
Thus, search engines implement filtering of content:
- Search engine crawlers come to a site with a ‘budget’ of how many pages they can crawl per session. If the spider crawls a page that has duplicate content as the previous page it is a waste of their ‘budget’. Thus, fewer of your pages with good content will actually be crawled.
- Linking to pages that have duplicate content is a waste of ‘link juice’. You are passing on PageRank to pages that will not rank in the search engines so your ‘link juice’ is not used wisely.
- If a search engine only represents one of the pages you have and not the rest, which contain duplicate content, there is no clear way for the search engine to determine which of the duplicate content pages to include in the index. For example, let’s say your site contains 3 pages with duplicate content, the search engine does not know which page to include in the search results.
Problems that can arise due to duplicate content:
The last point made (#3) is the biggest problem with duplicate content. If you have two versions of a page, one for online use and one for print use. The search engine has a good chance of picking the page used for print use versus the online use. If it chooses the print use page (which can have lower page rank), the search engine will add this page to the index. The resulting effect would be your ‘print use’ page ranking is much lower than competitor pages and your ‘online use’ page not showing at all.
The best way to handle situations like these are with proper uses of nofollow tags and noindex inclusions for these types of pages. This way, the search engine would know not to use the print page, but to use the online version instead.
While it might seem as though you were penalized for duplicate content, in reality it was simply a lower ranking duplicate page showing up instead of the page you intended to show.
Another scenario in which problems can occur with duplicate content is when data is syndicated to third parties. Examples of this scenario include blogs and articles that use RSS technologies. The situation that happens frequently is that search engines might remove the page where your content was first published and instead use the version of whoever is republishing.
Thus when searching for that page on the search engine instead of your website coming up, the place where your content was syndicated to can come up.
To avoid situations like these if you are unable to noindex the content, it is important to make sure your articles or blog posts have back links to your website. When a search engine sees that the content has a backlink it knows that your version of the content is original.
In this situation, it is not necessarily a “penalty”, but you are not getting proper use of your content and your website is not ranking for your content.
Scenarios where penalties can occur:
The above situations are examples of what may look like a penalty but in fact are not penalties. A penalty would be defined as your site getting lower rankings and/or losing PageRank.
A scenario like this can only occur if a very high amount of your content is duplicate, meaning the same or very similar content can be found on other websites. The consequence of this scenario can be a dramatic decrease in traffic and even the possibility of not being included in search engine results at all.
In conclusion, the above scenarios can end up being very painful and need to be avoided at all costs from a business perspective. All three of the situations above have a negative effect on your website, traffic, and ultimately your conversion.
The best lesson learned is to avoid putting any non-original content on your website and to be careful who you distribute the content to.
Eric Enge, SEOmoz.com