Adding a robot.txt file

Step by step and tutorials on services we offer.

Moderator: Mod

Adding a robot.txt file

Postby dwhs web hosting » Tue Sep 27, 2005 11:22 am

Some Web site owners may want to control the way search engine spiders crawl and/or index their Web sites. The standard method of doing so is to create and insert a "robots.txt" file in the root directory of the Web site in question. A robots.txt file enables you to command all or certain spiders to steer clear of any part or all pages and files on a given Web site. A robots.txt file is a regular plain-text file. Once uploaded, visiting spiders will open and read the robots.txt file and implement its directions before accessing the affiliated Web site.

The robots.txt file consists of two defining elements:

user-agent:
disallow: /

The first element— "User-agent:" — specifies which agents, spiders or browsers should read and obey the commands in the file. An asterisk ("*") denotes "all spiders." The second element — "Disallow: /" — defines which files and directories should be blocked from the search engines.

The robots.txt file must be placed in the server's root directory; it cannot simply be put in a user's personal subdirectory space.

If you cannot access the root directory, you can instead use the "robots" <meta> tag format to direct the spiders' behavior.

Note that this tag does not allow you to target specific spiders.

Whereas a robots.txt file can be used to control spider behavior throughout an entire Web site, a separate "robots" <meta> tag must be inserted into each applicable Web page.

Click "Next Step" to create a robots.txt file or "robots" <meta> tag(s) for your Web site/Web page.

--------------------------

O.K. now to adding it:

This is actually a very easy task.

First you need to make a text file called robot.txt

So open your notepad.

Then paste this code on the open file:


Code: Select all
User-agent: *
Disallow: /folder-not-to-be-listed/
Disallow: /folder-not-to-be listed-2/


Then save the file under the name robot.txt in the same folder as the first page of your website.

So you can access the file:

http://your-main-domain.com/robot.txt (replace your-main-domain with your domain)

Once you can access the robot.txt file from your url as shown above, your all set.

Thank you,

DWHS Inc.
dwhs web hosting
Site Admin
 
Posts: 841
Joined: Tue Nov 12, 2002 7:10 pm
Location: LA

Return to Web Hosting How To's

Who is online

Users browsing this forum: No registered users and 1 guest

cron