Robots.txt file

From Joomla! Documentation

Revision as of 05:59, 25 October 2022 by Ceford (talk | contribs) (Marked this version for translation)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Other languages:
English

About Robots[edit]

Web robots, also known as crawlers, web wanderers or spiders, are programs that traverse the web automatically. Among many uses, search engines use them to index the web content.

The robots.txt file implements the Robots Exclusion Protocol (REP), which allows the website administrator to define what parts of the site are off limits to specific robot user agents. Web administrators can Allow access to their web content and Disallow access to cgi, private and temporary directories, for example, if they do not want pages in those areas indexed.

Where to Place the robots.txt File[edit]

A standard robots.txt file is included in your Joomla root. The robots.txt file must reside in the root of the domain or subdomain and must be named robots.txt.

Joomla in a Subdirectory[edit]

A robots.txt file located in a subdirectory isn't valid. The bots only check for this file in the root of the domain. If the Joomla site is installed within a folder such as example.com/joomla/, the robots.txt file must be moved to the site root at example.com/robots.txt.

Note: The Joomla folder name must be prefixed to the disallowed path. For example, the Disallow rule for the /administrator/ folder must be changed to read Disallow: /joomla/administrator/

Joomla robots.txt Contents[edit]

This is the contents of a standard Joomla robots.txt:

User-agent: *
Disallow: /administrator/
Disallow: /api/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/

Robot Exclusion[edit]

You can exclude directories or block robots from your site by adding a Disallow rule to the robots.txt file. For example, to prevent any robots from visiting the /tmp directory, add this rule:

Disallow: /tmp/

See also:

Syntax Checking[edit]

For syntax checking you can use a validator for robots.txt files. Try one of these:

General Information[edit]