SEO | Search Engine Optimization: September 2012

Error 404 File not found

Most of the people think that error 404 is SEO error but let me tell you that it is not correct. It’s an Internet error. Whenever you ask for a webpage or file which doesn’t exist or the destination of that file has been changed then in that case Internet throws a status code error called “Error 404 File not found”. The 404 error message is an HTTP standard status code.

History of Status Codes

The World Wide Web Consortium (W3C) established HTTP Status codes in 1992. Tim Berners – Lee, who invented the Web and the first web browser in 1990, defined the status codes also.

Good 404 error page

It has an error message written in plain language so that even non-technical users will find it easy to understand. It explains why the URL could not be found. It lists the common mistakes, provides spell check functionality for the failed URLs. It has a link to email the webmaster or a form to submit the broken links.

"Robots.txt" is a regular text file that through its name has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth.

Robots.txt lets you tell Google just that.

The robots.txt also known as Robots exclusion standard. It prevents web crawlers and web robots from accessing some parts of website. Search engines often use Robots.
Robots.txt was invented in 1994 by Martjin Koster while working for WebCrawler.

Creating your "robots.txt" file
Create a regular text file called "robots.txt", and make sure it's named exactly that. This file must be uploaded to the root accessible directory of your site, not a subdirectory (ie: http://www.mysite.com but NOT http://www.mysite.com/stuff/).

Create a robots.txt file

The robots.txt file is used to instruct search engine robots about what pages on your website should be crawled and consequently indexed. Most websites have files and folders that are not relevant for search engines (like images or admin files) therefore creating a robots.txt file can actually improve your website indexation.
A robots.txt is a simple text file that can be created with Notepad. If you are using WordPress a sample robots.txt file would be:
User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/

“User-agent: *” means that all the search bots (from Google, Yahoo, MSN and so on) should use those instructions to crawl your website. Unless your website is complex you will not need to set different instructions for different spiders.

“Disallow: /wp-” will make sure that the search engines will not crawl the WordPress files. This line will exclude all files and foldes starting with “wp-” from the indexation, avoiding duplicated content and admin files.

If you are not using WordPress just substitute the Disallow lines with files or folders on your website that should not be crawled, for instance:

User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /any other folder to be excluded/
After you created the robots.txt file just upload it to your root directory and you are done!

SEO | Search Engine Optimization

Thursday, September 27, 2012

Error 404 or File not found?

Saturday, September 22, 2012

What is robot.txt?