Effectively Using Robots Meta Tags

July 19, 2022

The "robots" meta tag, when used properly, will tell the search engine spiders whether or not to index and follow a particular page. For the purposes of this article, we will be using the "()" symbols to represent the "" in html coding.

Some examples of robot usage are as follows:

(meta name="robots" content="index,follow")

(meta name="robots" content="noindex,follow")

(meta name="robots" content="index,nofollow")

(meta name="robots" content="noindex,nofollow")

Let us first examine what these terms mean before we explain the usage for each one:

"index"- This directive tells the search engine robots (or spiders) that it is okay to index the page. Another words, you are allowing the search engine to include your page within their search directory.

"noindex" - Using this tag, you are informing the robots that this page should not be indexed. Simply put, this page will not appear in their search directory.

"follow"- When you use this tag, you are telling the search engines that you want their robot to follow any links that are found on that page.

"nofollow"- The opposite of the above definition, this directive will tell the robots not to follow any links on your page.

Putting It All Together:

With the robots tags explained, let's examine the usage for each one.

1. (meta name="robots" content="index, follow")

This tag will be used when you want the search engine spiders to index the page and follow the links to other pages. Most search engines use this setting as a "default" setting. It is possible that you may not even need to use this tag if you want the search engines to follow and index the page. However, an article at Search Engine World (searchengineworld.com/metatag/robots.htm) suggests that Inktomi does not use this as their default setting. Instead, they use the "index, nofollow" tag.

Better safe than sorry!

There has been much debate over whether or not it is necessary to use this tag. If there is even a slight possibility that some search engines do not use this as the default setting, then it would only make sense to include this tag if you want your page included in their search directory AND your links to be followed. Do the research and decide for yourself.

2. (meta name="robots" content="noindex,follow")

This tag can be used to tell the search engines that you do not want the page included in their directory, but you DO want them to follow the links that lead to other pages. A good example of its usage would be your disclaimer or privacy policy pages. You may not want these pages to show up in the search engines if they are only important to your actual visitors. However, if the links on these pages point to other pages that you want the search engines to find, then you would still want the spiders to "follow" those links.

3. (meta name="robots" content="index,nofollow")

This tag will allow your page to be indexed in the search engines, but any links on that page will not be followed.

4. (meta name="robots" content="noindex,nofollow")

When using this tag, the search engine spiders will not include this page in their directory and will not follow any links on the page either.

Where does the "robots" tag belong?

The "robots" meta tag should be used within the (head) and (/head) tags of your page. These tags are located at the top of the html coding. It will look something like this:

(html)

(head)

(title)Title of your page goes here(/title)

(meta name="keywords" content="word1,word2,word3,word4")

(meta name="description" content="A brief description of the content of this page.")

(meta name="robots" content="index,follow")

(/head)

(body)

Your webpage information here.

(/body)

(/html)

More Robots Tags

Google automatically archives a page as it crawls it. This is called a "cached" version of the page. Visitors can retrieve the archived version of the page by clicking on the "cached" link within Google's search results. If you do not want your content to be archived, you can use the following tag:

(meta name="robots" content="noarchive")

*This will only prevent your page from being "cached". If you do not want your page to be indexed at all, you will still need to include the "noindex" tag.

Another alternative to the above tag is the tag that specifically addresses Google only. If you want other search engine robots to archive your site, but you would like to prevent Google from doing so, then you can use the following tag:

(meta name="googlebot" content="noarchive")

The Misuse of Robots Tags

Something that has been popping up on websites everywhere is the Google indexing tag. This is a silly little tag that is not necessary. Some people think this tag helps Google to spider your site, but this simply isn't true.

The tag looks like this: (meta name="googlebot" content="index,follow").

Some website owners believe that by specifying "googlebot" that their site has the advantage of being spidered faster and listed by Google. According to Google's web crawler infomation, you may use the noindex, nofollow, or noarchive tags when you DO NOT want Google to cache, index, or follow that page. Google's default setting is to index and follow the links on the page, so this "so called" googlebot index/follow tag that some site owner made up one day is completely unnecessary.

https://certsgoal.com/linux-foundation/cka-exam-questions/

https://certsgoal.com/linux-foundation/ckad-exam-questions/

https://certsgoal.com/linux-foundation/cks-exam-questions/

https://certsgoal.com/linux-foundation/