Google not following robots.txt and indexes duplicate content for isulong-seoph.com
By Benj Arriola - Posted on Tue Jul 4, 2006Checking indexed pages of isulong-seoph.com on Google gave the following result:

What is displayed is all of the Email a Friend links at the end of every article on the isulong-seoph.com site. This was used to promote viralness so that is encourages people to send out the story to more people. And Google has crawled every link of the page. Below is a screenshot of that link from isulong-seoph.com:

The link here comes out in a popup window which all look the same, and just differ slightly in title and URL. Google can see it and not only do they have similar content, they DOM is very similar as well which has also been a basis of duplicate content. You can clearly see in the Google SERP above that all of these email links are within the /wp-content/ folder. And within the robots.txt file of isulong-seoph.com has this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-content/
Disallow: /wp-includes/
Which clearly states not to crawl over anything inside the folders mentioned, which includes /wp-content/
But for some reason, Google still crawled that folder and included all of the links to the popup windows. Isulong-seoph.com currently uses the WP-Email plugin by Lester Chan and we decided to modify it, where we added the rel=”nofollow” attribute in all links and included the following meta tag in the email popup window:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
This was implemented on isulong-seoph.com already and hopefully Google corrects itself on the next reindexing, duplicate content can decrease ranking. You can download the modified WP-Email plugin on the downloads page.
We are also going to send this article to Lester Chan so he could possible include it in the future versions of his cool plugin.
Leave a Reply




