Methods Used to Prevent Google Indexing
Have you ever needed to prevent Google from indexing a particular URL on your web site and displaying it in their search engine results pages (SERPs)? If you manage web sites long enough, a day will likely come when you need to know how to do this.
The three methods most commonly used to prevent the indexing of a URL by Google are as follows:
Using the rel="nofollow" attribute on all anchor elements used to link to the page to prevent the links from being followed by the crawler.
Using a disallow directive in the site's robots.txt file to prevent the page from being crawled and indexed.
Using the meta robots tag with the content="noindex" attribute to prevent the page from being indexed.
While the differences in the three approaches appear to be subtle at first glance, the effectiveness can vary drastically depending on which method you choose.
Using rel="nofollow" to prevent Google indexing
Many inexperienced webmasters attempt to prevent Google from indexing a particular URL by using the rel="nofollow" attribute on HTML anchor elements. They add the attribute to every anchor element on their site used to link to that URL.
Including a rel="nofollow" attribute on a link prevents Google's crawler from following the link which, in turn, prevents them from discovering, crawling, and indexing the target page. While this method might work as a short-term solution, it is not a viable long-term solution.
The flaw with this approach is that it assumes all inbound links to the URL will include a rel="nofollow" attribute. The webmaster, however, has no way to prevent other web sites from linking to the URL with a followed link. So the chances that the URL will eventually get crawled and indexed using this method is quite high.
Using robots.txt to prevent Google indexing
Another common method used to prevent the indexing of a URL by Google is to use the robots.txt file. A disallow directive can be added to the robots.txt file for the URL in question. Google's crawler will honor the directive which will prevent the page from being crawled and indexed. In some cases, however, the URL can still appear in the SERPs.
Sometimes Google will display a URL in their SERPs though they have never indexed the contents of that page. If enough web sites link to the URL then Google can often infer the topic of the page from the link text of those inbound links. As a result they will show the URL in the SERPs for related searches. While using a disallow directive in the robots.txt file will prevent Google from crawling and indexing a URL, it does not guarantee that the URL will never appear in the SERPs.
Using the meta robots tag to prevent Google indexing
If you need to prevent Google from indexing a URL while also preventing that URL from being displayed in the SERPs then the most effective approach is to use a meta robots tag with a content="noindex" attribute within the head element of the web page. Of course, for Google to actually see this meta robots tag they need to first be able to discover and crawl the page, so do not block the URL with robots.txt. When Google crawls the page and discovers the meta robots noindex tag, they will flag the URL so that it will never be shown in the SERPs. This is the most effective way to prevent Google from indexing a URL and displaying it in their search results.
Comments
Post a Comment