SEO/SEM News

Probably not new to some of you

Google not following robots.txt and indexes duplicate content for isulong-seoph.com

By Benj Arriola - Posted on Tue Jul 4, 2006

Checking indexed pages of isulong-seoph.com on Google gave the following result:

Isulong SEOph index pages for isulong-seoph.net

What is displayed is all of the Email a Friend links at the end of every article on the isulong-seoph.com site. This was used to promote viralness so that is encourages people to send out the story to more people. And Google has crawled every link of the page. Below is a screenshot of that link from isulong-seoph.com:

Email-a-Friend link of isulong-seoph.com

The link here comes out in a popup window which all look the same, and just differ slightly in title and URL. Google can see it and not only do they have similar content, they DOM is very similar as well which has also been a basis of duplicate content. You can clearly see in the Google SERP above that all of these email links are within the /wp-content/ folder. And within the robots.txt file of isulong-seoph.com has this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-content/
Disallow: /wp-includes/

Which clearly states not to crawl over anything inside the folders mentioned, which includes /wp-content/

But for some reason, Google still crawled that folder and included all of the links to the popup windows. Isulong-seoph.com currently uses the WP-Email plugin by Lester Chan and we decided to modify it, where we added the rel=”nofollow” attribute in all links and included the following meta tag in the email popup window:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

This was implemented on isulong-seoph.com already and hopefully Google corrects itself on the next reindexing, duplicate content can decrease ranking. You can download the modified WP-Email plugin on the downloads page.
We are also going to send this article to Lester Chan so he could possible include it in the future versions of his cool plugin.

Bookmark and Share: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Sphinn
  • Bumpzee
  • bodytext
  • Propeller
  • StumbleUpon
  • del.icio.us
  • Mixx
  • Reddit
  • Sk-rt
  • Technorati
  • TwitThis
  • Facebook
  • SphereIt
  • NewsVine
  • Google
  • YahooMyWeb
  • Live
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • connotea
  • De.lirio.us
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • Netvouz
  • RawSugar
  • description
  • Shadows
  • Simpy
  • Smarking
  • Spurl
  • TailRank
  • BlogMemes
  • BlogMemes Sp
  • blogtercimlap
  • Blue Dot
  • description
  • description
  • eKudos
  • Internetmedia
  • kick.ie
  • MyShare
  • PlugIM
  • description
  • ppnow
  • Rec6
  • Scoopeo
  • Slashdot
  • Socialogs
  • Taggly
  • ThisNext
  • Webride
  • Wykop

Leave a Reply

XHTML: You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Links will be appended with rel="nofollow" attributes.

Locations of visitors to this page
KeywordDiscovery.com Keyword Research Tool Wordtracker Keyword Research