Block or NoIndex RSS and XML feeds

Blocking or No indexing of RSS feeds can be done in a similar way to  standard HTML documents. Whereas HTML documents contain a statement within the document <head> i.e:

<meta name=”robots” content=”noindex,follow” >

RSS (a form of XML) can also contain a “meta” tag usually located before the first <item>.The directive can be seen below:

<xhtml:meta content="noindex" name="robots" xmlns:xhtml="http://www.w3.org/1999/xhtml" />

Interestingly the “noindex” directive is currently followed by Google and Yahoo! only.

Utilising the Robots.txt is often an easier option. It also has the added benefit that the RSS feed will be blocked by those Search Engines that do not yet obey the RSS Noindex meta tag.

Publishing all the feeds under a sub-directory will make blocking straight forward.

For example if all feeds are published within the sub-directory: /feeds/ i.e.:

http://yoursite.com/feeds/rss.xml

Then the following Robots.txt statement:

User-agent: *

Disallow: /feeds/

Will block the RSS feeds nicely…

Alternatively for sites that run multiple content management systems (i.e. blogs, forums, photos and articles ) or simply don’t have a single directory for all their RSS feeds. Then Robots.txt allows pattern matching of URL’s using simple expressions. For example:

User-agent: *

Disallow: /*.xml$

Will block all XML file types from being crawled and indexed. Be careful not to block unintended content i.e. your XML sitemap!.

For more information refer to Google’s Pattern matching webmaster help. Note: Robots.txt pattern matching is an extension to the Robots.txt directive and is currently followed by Google, Yahoo! and Microsoft Live only.

Now just to put the cat amongst the pigeons I shall ask the question; is there any real benefit in going to the effort of blocking or no indexing RSS or XML feeds in large?

Do the Search Engines consider such feeds as duplicate content?

Do they indeed use feeds as a means for content discovery? After all a Search Engine sitemap is actually a large XML file.

The answer is that it may not be worth it. Google’s Adam Lasnik suggests the issue isn’t even on Google’s radar and that XML based results are unlikely to ever appear within general search engine results pages. Eric Enge from WebProNews also concludes there is little risk in letting your feeds be indexed.

I’d be interested to hear your thoughts…

Notice the Meta no index statement in situ below. Just before the <item> tag.

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>searchideas.co.uk</title>
	<atom:link href="http://www.searchideas.co.uk/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://www.searchideas.co.uk/blog</link>

	<description>Search Ideas is a blog all about SEO, PPC and Online happenings...</description>
	<pubDate>Tue, 02 Dec 2008 22:00:30 +0000</pubDate>

	<generator>http://wordpress.org/?v=2.5</generator>
	<language>en</language>

	<xhtml:meta content="noindex" name="robots" xmlns:xhtml="http://www.w3.org/1999/xhtml" />
	<item>
	...
	</item>

</channel>
</rss>

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Comment

  1. […] you may wish to block or noindex your RSS/XML feed from appearing within the […]

Leave a Comment