"Robots.txt" is a regular text file that through its name has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth.
Robots.txt lets you tell Google just that.
The robots.txt also known as Robots exclusion standard. It prevents web crawlers and web robots from accessing some parts of website. Search engines often use Robots.
Robots.txt was invented in 1994 by Martjin Koster while working for WebCrawler.
Creating your "robots.txt" file
Create a regular text file called "robots.txt", and make sure it's named exactly that. This file must be uploaded to the root accessible directory of your site, not a subdirectory (ie: http://www.mysite.com but NOT http://www.mysite.com/stuff/).
Create a robots.txt file
The robots.txt file is used to instruct search engine robots about what pages on your website should be crawled and consequently indexed. Most websites have files and folders that are not relevant for search engines (like images or admin files) therefore creating a robots.txt file can actually improve your website indexation.
A robots.txt is a simple text file that can be created with Notepad. If you are using WordPress a sample robots.txt file would be:
User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
“User-agent: *” means that all the search bots (from Google, Yahoo, MSN and so on) should use those instructions to crawl your website. Unless your website is complex you will not need to set different instructions for different spiders.
“Disallow: /wp-” will make sure that the search engines will not crawl the WordPress files. This line will exclude all files and foldes starting with “wp-” from the indexation, avoiding duplicated content and admin files.
If you are not using WordPress just substitute the Disallow lines with files or folders on your website that should not be crawled, for instance:
User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /any other folder to be excluded/
After you created the robots.txt file just upload it to your root directory and you are done!

No comments:
Post a Comment