Thursday, November 1, 2012

Canonical Tag - Still Fighting with Duplication?


Whenever your website is having trouble in ranking because of duplication what comes in your mind is how I remove this duplication. One of the best and very popular ways is using canonical tag.



How to use canonical tag

URL normalisation is the process of transforming a URL into normalised or canonical URL. By this you can help search engine to reduce duplicate page indexing. The canonical tag helps webmasters make clear to the search engines which page is the original one.You may have different URLs all pointing to the same page.

For example
a) www.yourdomainname.com/page-id=123?/size12/~dyn987The content of this page is also available on the following page: b) www.yourdomainname.com/page/123/size12/type987

It will cause you duplicate content issues and affect your rankings.This can be easily corrected by using the 

Canonical Link Element

URL (a) is unfriendly URL to search engine and (b) is friendly.You have double content. Two different URL's but with the same content. One links to (a) and the other links to (b). This is where you leak PageRank.
The page mentioned in the canonical link will be indexed in the database of the search engines.

Why should you use canonical tag?

By adding the canonical tag you inform the search engine which page is the main page if you have double content.The rel=canonical passes the same amount of link juice as a 301 redirect, and takes up much less development time to implement.

Where should you add canonical tag?

The canonical link element is goes between the head tags.
<head>
<title>Your Page</title>
<link rel=’canonical’ href=’http:// www.yourdomainname.com/page/123/size12/type987/>
</head>
You may add canonical meta tag to all of your webpages.

Monday, October 8, 2012

What is Sitemap?

Sitemaps are list of pages accessible to users and search engines. There are two types of sitemaps – XML and HTML sitemaps.

XML sitemaps are for search engines. It is in a structured format which tells search engines about the pages in site. HTML sitemaps are designed for users, it helps to find content of pages.

Sitemap can be XML, TXT or HTML usually XML and TXT is served for robots. Sitemaps are an index page, which summarizes the content of the pages within a single site.

Two sitemaps are necessary. One is for users and the other is for search engines spiders.


Sitemap services

Importance of Sitemaps

Sitemap is very important for any website. Sitemap should be there on your websites homepage. The homepage often has the highest PageRank, so that by putting sitemap on homepage, you will ensure that your pages get indexed as quickly as possible.

The Google Sitemap service first started in June 2005 In order to use the Google Sitemap Service the web owner only needs to download free software called Sitemap Generator. This tool automatically creates a Sitemap using the Sitemap protocol.



Thursday, September 27, 2012

Error 404 or File not found?

Error 404 File not found

Most of the people think that error 404 is SEO error but let me tell you that it is not correct. It’s an Internet error. Whenever you ask for a webpage or file which doesn’t exist or the destination of that file has been changed then in that case Internet throws a status code error called “Error 404 File not found”. The 404 error message is an HTTP standard status code.

Error 404 File not found


History of Status Codes

The World Wide Web Consortium (W3C) established HTTP Status codes in 1992. Tim Berners – Lee, who invented the Web and the first web browser in 1990, defined the status codes also.

Good 404 error page

It has an error message written in plain language so that even non-technical users will find it easy to understand.  It explains why the URL could not be found. It lists the common mistakes, provides spell check functionality for the failed URLs. It has a link to email the webmaster or a form to submit the broken links.

Saturday, September 22, 2012

What is robot.txt?


"Robots.txt" is a regular text file that through its name has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth.



Robots.txt lets you tell Google just that.

The robots.txt also known as Robots exclusion standard. It prevents web crawlers and web robots from accessing some parts of website. Search engines often use Robots.
Robots.txt was invented in 1994 by Martjin Koster while working for WebCrawler.

Creating your "robots.txt" file
Create a regular text file called "robots.txt", and make sure it's named exactly that. This file must be uploaded to the root accessible directory of your site, not a subdirectory (ie: http://www.mysite.com but NOT http://www.mysite.com/stuff/).

Create a robots.txt file

The robots.txt file is used to instruct search engine robots about what pages on your website should be crawled and consequently indexed. Most websites have files and folders that are not relevant for search engines (like images or admin files) therefore creating a robots.txt file can actually improve your website indexation.
A robots.txt is a simple text file that can be created with Notepad. If you are using WordPress a sample robots.txt file would be:
User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/

“User-agent: *” means that all the search bots (from Google, Yahoo, MSN and so on) should use those instructions to crawl your website. Unless your website is complex you will not need to set different instructions for different spiders.

“Disallow: /wp-” will make sure that the search engines will not crawl the WordPress files. This line will exclude all files and foldes starting with “wp-” from the indexation, avoiding duplicated content and admin files.

If you are not using WordPress just substitute the Disallow lines with files or folders on your website that should not be crawled, for instance:

User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /any other folder to be excluded/
After you created the robots.txt file just upload it to your root directory and you are done!