Difference between revisions of "Robots.txt file"

From Joomla! Documentation

m
(8 intermediate revisions by 4 users not shown)
Line 2: Line 2:
 
{{review}}
 
{{review}}
 
Web Robots (Crawlers, Web Wanderers or Spiders) are programs that traverse the Web automatically. Among many uses, search engines use them to index the web content.
 
Web Robots (Crawlers, Web Wanderers or Spiders) are programs that traverse the Web automatically. Among many uses, search engines use them to index the web content.
Robots.txt implements the REP (Robots Exclusion Protocol), which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can Allow access to their web content and Disallow access to cgi, private and temporary directories, for example, if they do not want pages in those areas indexed.  
+
Robots.txt implements the REP ([[wp:Robots exclusion standard|Robots Exclusion Protocol]]), which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can Allow access to their web content and Disallow access to cgi, private and temporary directories, for example, if they do not want pages in those areas indexed.  
  
 
==Where to place my robots.txt file?==
 
==Where to place my robots.txt file?==
 
A standard robots.txt its included in your joomla root.
 
A standard robots.txt its included in your joomla root.
The robots.txt file must reside in the root of the domain and must be named "robots.txt".
+
The robots.txt file must reside in the root of the domain or subdomain and must be named <code>robots.txt</code>.
  
===Joomla in a subdomain===
+
===Joomla in a subdirectory===
 
A robots.txt file located in a subdirectory isn't valid, as bots only check for this file in the root of the domain.
 
A robots.txt file located in a subdirectory isn't valid, as bots only check for this file in the root of the domain.
If the Joomla site is installed within a folder such as at e.g. www.example.com/joomla/ the robots.txt file MUST be moved to the site root at e.g. www.example.com/robots.txt .
+
If the Joomla site is installed within a folder such as at e.g. <code>example.com/joomla/</code> the robots.txt file MUST be moved to the site root at e.g. <code>example.com/robots.txt</code>.
Note: The joomla folder name MUST be prefixed to the disallowed path, e.g. the Disallow rule for the /administrator/ folder MUST be changed to read Disallow: /joomla/administrator/
+
Note: The joomla folder name MUST be prefixed to the disallowed path, e.g. the Disallow rule for the <code>/administrator/</code> folder MUST be changed to read <code>Disallow: /joomla/administrator/</code>
  
 
==Joomla robots.txt contents==
 
==Joomla robots.txt contents==
This is the contents of a standard Joomla robots.txt
+
This is the contents of a [https://github.com/joomla/joomla-cms/blob/master/robots.txt standard Joomla robots.txt]:
  
<source lang="php">
+
<source lang="text">
 
User-agent: *
 
User-agent: *
 
Disallow: /administrator/
 
Disallow: /administrator/
Line 35: Line 35:
 
</source>
 
</source>
  
==Robot Exclusion==
+
==Robot exclusion==
You can exclude directories or block robots from your site adding Disallow rule to the robots.txt
+
You can exclude directories or block robots from your site adding Disallow rule to robots.txt.  E.g. to prevent any robots from visiting the <code>/tmp</code> directory, add the following rule:
  
Infos:
+
<source lang="text">
* [http://www.robotstxt.org/orig.html A Standard for Robot Exclusion]
+
Disallow: /tmp/
 +
</source>
 +
 
 +
See also:
 +
* [http://www.robotstxt.org/wc/robots.html A Standard for Robot Exclusion] — specification of the robots.txt standard
 
* [http://support.google.com/webmasters/bin/answer.py?hl=en-GB&answer=156449 Block or remove pages using a robots.txt file]
 
* [http://support.google.com/webmasters/bin/answer.py?hl=en-GB&answer=156449 Block or remove pages using a robots.txt file]
  
Line 45: Line 49:
 
For syntax checking you can use a validator for robots.txt files.
 
For syntax checking you can use a validator for robots.txt files.
 
Try one of these:
 
Try one of these:
* [http://tool.motoricerca.info/robots-checker.phtml Motoricerca Robots.txt Checker]
+
* [http://tool.motoricerca.info/robots-checker.phtml Robots.txt Checker (by Motoricerca)]
* [http://www.frobee.com/robots-txt-check]Robots.txt Frobee Robots.txt Checker]
+
* [http://www.frobee.com/robots-txt-check Robots.txt Checker (by Frobee)]
* [http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php Search Engine Promotion Help robots.txt Checker]
+
* [http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php robots.txt Checker (by Search Engine Promotion Help)]
  
 +
== See also ==
  
===Joomla! Documentation===
+
For additional information please read:
* [[Creating_a_Custom_404_Error_Page|Creating a Custom 404 Error Page]]
 
* [[Custom_error_pages|Custom Error Pages]]
 
* [[System_error_pages|System Error Pages]]
 
* [[Making_your_site_Search_Engine_Friendly|Making your site Search Engine Friendly]]
 
* [[Entering_search_engine_meta-data|Entering search engine meta-data]]
 
  
===How to: Robots.txt and Joomla===
+
=== Joomla! Documentation ===
 +
* [[Creating a Custom 404 Error Page]]
 +
* [[Custom error pages]]
 +
* [[System error pages]]
 +
* [[Making your site Search Engine Friendly]]
 +
* [[Entering search engine meta-data]]
 +
 
 +
=== How to: Robots.txt and Joomla ===
 
* [http://www.kangainternet.com.au/joomla-seo-blog/joomla-google-the-robots.txt-file.html Joomla & Google Robots Text File]
 
* [http://www.kangainternet.com.au/joomla-seo-blog/joomla-google-the-robots.txt-file.html Joomla & Google Robots Text File]
 
* [http://www.joomlablogger.net/seo/joomla-seo/how-to-set-up-the-robotstxt-file-in-joomla How to set up the robots.txt file in Joomla]
 
* [http://www.joomlablogger.net/seo/joomla-seo/how-to-set-up-the-robotstxt-file-in-joomla How to set up the robots.txt file in Joomla]
  
===General informations===
+
=== General information ===
* [http://www.robotstxt.org/orig.html The Web Robots Pages]
+
* http://www.robotstxt.org/ — main website for robots.txt
 +
* [http://www.robotstxt.org/wc/robots.html A Standard for Robot Exclusion] — specification of the robots.txt standard
 
* [https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag Robots meta tag and X-Robots-Tag HTTP header specifications]
 
* [https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag Robots meta tag and X-Robots-Tag HTTP header specifications]
 
* [http://www.searchtools.com/robots/robots-txt.html Robots.txt and Search Indexing]
 
* [http://www.searchtools.com/robots/robots-txt.html Robots.txt and Search Indexing]
  
===Tools for Webmasters===
+
=== Tools for Webmasters ===
 
* [https://www.google.com/webmasters/tools Google Webmaster Tools]
 
* [https://www.google.com/webmasters/tools Google Webmaster Tools]
 +
 +
[[Category:Search Engine Optimisation]]
 +
[[Category:Content Management]]

Revision as of 11:56, 13 October 2014

Copyedit.png
This Article Needs Your Help

This article is tagged because it NEEDS REVIEW. You can help the Joomla! Documentation Wiki by contributing to it.
More pages that need help similar to this one are here. NOTE-If you feel the need is satistified, please remove this notice.


Web Robots (Crawlers, Web Wanderers or Spiders) are programs that traverse the Web automatically. Among many uses, search engines use them to index the web content. Robots.txt implements the REP (Robots Exclusion Protocol), which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can Allow access to their web content and Disallow access to cgi, private and temporary directories, for example, if they do not want pages in those areas indexed.

Where to place my robots.txt file?[edit]

A standard robots.txt its included in your joomla root. The robots.txt file must reside in the root of the domain or subdomain and must be named robots.txt.

Joomla in a subdirectory[edit]

A robots.txt file located in a subdirectory isn't valid, as bots only check for this file in the root of the domain. If the Joomla site is installed within a folder such as at e.g. example.com/joomla/ the robots.txt file MUST be moved to the site root at e.g. example.com/robots.txt. Note: The joomla folder name MUST be prefixed to the disallowed path, e.g. the Disallow rule for the /administrator/ folder MUST be changed to read Disallow: /joomla/administrator/

Joomla robots.txt contents[edit]

This is the contents of a standard Joomla robots.txt:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/

Robot exclusion[edit]

You can exclude directories or block robots from your site adding Disallow rule to robots.txt. E.g. to prevent any robots from visiting the /tmp directory, add the following rule:

Disallow: /tmp/

See also:

Syntax checking[edit]

For syntax checking you can use a validator for robots.txt files. Try one of these:

See also[edit]

For additional information please read:

Joomla! Documentation[edit]

How to: Robots.txt and Joomla[edit]

General information[edit]

Tools for Webmasters[edit]