How to Create a Robots.txt File
Web site owners use the robots.txt
file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
It works likes this: a robot wants to visit a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: * Disallow: /
The “User-agent: *” means this section applies to all robots. The “Disallow: /” tells the robot that it should not visit any pages on the site.
Where to Put it?
In simple answer: It should be located in the top-level directory of your web server.
So, How to Create a Robots.txt File? The robots.txt
file is a text file, with one or more records. It usually contains a single record looking like this:
To allow all robots complete access
User-agent: * Disallow:
To exclude all robots from the entire server
User-agent: * Disallow: /
To exclude all robots from part of the server
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /junk/
To exclude a single robot
User-agent: BadBot Disallow: /
To allow a single robot
User-agent: Google Disallow: User-agent: * Disallow: /
To exclude all files except one
This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say “stuff”, and leave the one file in the level above this directory:
User-agent: * Disallow: /~joe/stuff/
Alternatively you can explicitly disallow all disallowed pages:
User-agent: * Disallow: /~joe/junk.html Disallow: /~joe/foo.html Disallow: /~joe/bar.html
Please click here to learn more about robots.txt
file.