Monday, February 18, 2008

Link to Us

The Robots Meta Tag

Robots Overview

A robot is a program that automatically travels around the Web and retrieves documents for the search engines.  It also follows the hyper-links on the pages that it finds and thus finds more pages to retrieve.

These Web Robots are referred to as Web Spiders, Web Crawlers, Web Robots, or just plain Spiders, Crawlers, Robot or Bots. The software that controls these Robots doesn't actually cause a robot to go to a website.  Rather, these visits are simply requests for web pages from a website so the Robot can get copies of the documents

Search engines, like Google, have spider programs that search the web constantly.   As they search, they retrieve significant information from each page that they "crawl" and then copy this information into a huge database.  Then when a web surfer goes to Google and types in a query, the search engine can quickly retrieve this information from the database.

The more thorough a job that a Web Spider does of crawling your site, the more information it will pick up about your site, and the more pages from your Website that will be indexed by the search engines.  These things will increase the chances that your pages will appear in search results. 

 

The Robots Tag Can Help You Get Your Website Pages Indexed

The Robots Meta can be used to tell Search Engine Spiders whether or not to read and index a particular web page and whether you want them to follow the links on that web page.  This is a form of SEO, or Search Engine Optimization since you are telling the search engines to NOT spend time reading certain pages and following certain links. 

When a search engine robot visits your web site, it does not usually read all of the pages in your entire site, but follows your internal page links to read just some of the pages.  Then when it comes back another time it will likely follow a different internal path, and read and index more and different pages.  Thus it may take several  visits by a search engine spider before all or most of the pages of your website are indexed. 

So if you have some pages that you don't want the spiders spending time on, then using the Robots Meta Tag would be a way to help the spiders more quickly read and index the pages you want them to read and index and not spend time on unimportant pages.

 

Placement of The Robots Meta Tags

You can instruct the visiting robot to INDEX the contents of the page to their database, or NOT INDEX the page, to scan the page for LINKS to be followed or to NOT SCAN the page for links.  That's pretty much it.

  • Like any <META> tag it should be placed in the HEAD section of an HTML page.
  • There are two attributes:  The NAME attribute and the CONTENT attribute.
  • The "NAME" attribute must be "ROBOTS".

Here is how the robots META tag would look inside the HEAD section of your web page:

<html>
<head>
<title> The Title of your Webpage Here.</title>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
</head>;

 

A few things to know when you use the robots META tag:

  1. A Robot could be programmed to ignore your <META> tag.  This would be especially true of Malware Robots that scan the web for security weaknesses and also for email address harvester programs that are used by spammers to collect email addresses.
  2. The NOFOLLOW directive only apply to the links on the page with the directive. Thus, a robot could still find links that are listed on your page if those links are listed on another page in your website that doesn't use the NOFOLLOW directive.  Also, if those links are listed on another website, they could still be picked up.
  3. Don't confuse the NOFOLLOW <META> tag directive with the rel="nofollow"  which is an attribute for the hyperlink element "A", which tells a robot not to follow a single hyperlink..

The Robots META tag is also described in the W3C HTML 4.01 specification, in the Appendix B.4.1.

 

The Robots "CONTENT" Attribute

Valid Values for the Robots "CONTENT" Attribute are:
ALL, INDEX, NOFOLLOW, NOINDEX.

If there is no robots <META> tag, then the default tag content is "INDEX,FOLLOW".  There is no need to spell out the default value, so that leaves only a few options you need to be concerned with if you want to tell the search engine spiders how to treat your web page:

<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Back To Top

Official PayPal Seal

Home  |   Link Partners  |  Link to Us  |  Our Link Exchange Policy
Glossary of Terms  |  Privacy Policy  |  Site Map  |  About Us   |  Contact Us

Copyright © 2007-2008   Donald Dean Websites - All Rights Reserved