Robot.txt is a text file not html file you put on your site to tell search robots which pages you would like them not to visit. Suppose in your websites you have so many pages and one of the page is such that you don’t want search engine to visit that page because you are having very sensitive data that you don’t allow world to see that data then you can put that page in the robot.txt file so that search engine will not visit that particular page and it will not index that page.
The location of robot.txt is very important. It must be in the main directory i.e. suppose you are having website called mydomain.com then you have to put robot.txt (that means file name will be robot.txt) in http://mydomain.com/robot.txt. And if they don’t find it there, they simply assume that this site does not have robot.txt file and therefore they index everything they find along the way.
EXAMPLE:
1) Here's a basic "robots.txt":
User-agent: *
Disallow: /
With all above declared, all robots (indicated by *) and / means not to index any page it means search engine will not visit any page.
2) Suppose you are having website like an example http://www.shmoop.com/ and suppose there is one page in that website http://www.shmoop.com/shakespeare/ and you want search engine to visit this page so your code will be
User-agent: *
Disallow: /Shakespeare/
First / means it will automatically takes your domain path i.e. http://www.shmoop.com and Shakespeare means search engine will not visit that page.
No comments:
Post a Comment