Web crawler cannot find the robots.txt file

Web crawler cannot find the robots.txt file

Free Complete Site Audit

Access a full website audit with over 300 technical insights.

Something went wrong. Please, try again later.
Trusted by
Sitechecker trusted company

Free Website SEO Checker & Audit Tool

  • Scan the site for 300+ technical issues
  • Monitor your site health 24/7
  • Track website rankings in any geo

Robots.txt is an important file for the proper operation of the site. This is where search engine crawlers find information about the pages of the web resource that should be scanned in the first place and which one should not be paid attention to at all. The robots.txt file is used when necessary to hide some parts of the site or the entire website from search engines. For example, a location with user personal information or a mirror of the site.

What should I do if the system auditor does not see this file? Read about this and other issues related to the robots.txt file in our article.

How Does robots.txt Work?

A robots.txt is a txt document with UTF-8 encoding. This file works for http, https, and FTP protocols. The encoding type is very important: if the robots.txt file is encoded in a different format, the search engine will not be able to read the document and determine which pages should be recognized or not. Other requirements for the robots.txt file are as follows:

  • all settings in the file are relevant only for the site where the robots.txt is located;
  • the file location is the root directory; the URL should look like this: https://site.com.ua/robots.txt;
  • the file size should not exceed 500 Kb.

When scannng the robots.txt file, search crawlers are granted permission to crawl all or some web pages; they can also be prohibited from doing so.
You can about this here.

Search Engine Response Codes

A web crawler scans the robots.txt file and gets the following responses:

  • 5XX – markup of a temporary server error, at which, the scanning stops;
  • 4XX – permission to scan each page of the site;
  • 3XX – redirect until the crawler gets another answer. After 5 attempts, a 404 error is fixed;
  • 2XX – successful scanning; all pages that need to be read are recognized.

If when navigating to https://site.com.ua/robots.txt, the search engine does not find or see the file, the response will be “robots.txt not Found”.

Reasons for the “robots.txt not Found” Response

Causes of the “robots.txt not Found” search crawler response may be the following:

  • the text file is located at a different URL;
  • the robots.txt file is not found on the website.

More information on this video by John Muller from Google.

Please note! The robots.txt file is located in the main domain directory as well as in subdomains. If you have included subdomains in the site audit, the file must be available; otherwise, the crawler will report an error stating that robots.txt is not found.

How to Check the Issue?

If a web crawler cannot find the robots.txt file, it might be due to the file not being placed in the correct location or it might not exist at all. To check this issue, first, make sure that the robots.txt file is located at the root of your website domain (e.g., https://www.yoursite.com/robots.txt). You can manually enter this URL in your web browser to see if the file can be accessed. If the file isn’t found, you need to create a robots.txt file and upload it to your root directory. This file tells crawlers which pages or sections of your site should not be processed or scanned. Also, ensure that your server permissions allow search engines to access the robots.txt file. If it’s correctly placed and still not detectable, check your server’s configuration or contact your hosting provider for further assistance.

The Sitechecker SEO tool provides an insightful feature that identifies issues related to an invalid robots.txt file on your website. By navigating to the “Site Audit” section and focusing on the “Indexability” issues, users can specifically pinpoint the problem labeled “Site has an invalid robots.txt file.”

Invalid robots-txt Issue

By clicking on “View issue,” you can access a detailed list of specific problems detected within your robots.txt file.

Unlock Better SEO: Validate Your Robots.txt Now!

Use our comprehensive site audit tool to identify and fix issues with your robots.txt file today!

Something went wrong. Please, try again later.

Why is This Important?

Failure to fix the “robots.txt not Found” error will result in incorrect work of search crawlers due to incorrect commands from the file. This, in turn, may lead to a drop in site ranking, incorrect data on site traffic. Also, if search engines do not see robots.txt, all pages of your site will be crawled, which is undesirable. As a result, you can miss the following problems:

  1. server overload;
  2. purposeless crawling of pages with the same content by search engines;
  3. longer time to process visitor requests.

The smooth operation of the robots.txt file is crucial for the smooth operation of your web resource. Therefore, let’s examine how to fix errors in the work of this test document.

How Should robots.txt be Corrected?

For search crawlers to respond properly to your robots.txt file, it must be properly debugged. Check the security text document for the following errors:

  1. Directive values are confused. Disallow or allow should be at the end of the phrase.
  2. Several page URLs in the same directive.
  3. Typos in the robots.txt file name or uppercase letters used in the file.
  4. User-agent is not specified.
  5. Absence of the directive in the phrase: disallow or allow.
  6. Inaccurate URL: use $ and / symbols to specify the gap.

You can check the robots.txt file using search engine validation tools. For example, use the Google robots.txt tester tool.

Fast Links

You may also like

View More Posts
How to fix canonical and rel=next/prev problem and make the site more usable
Site Audit Issues
How to fix canonical and rel=next/prev problem and make the site more usable
Ivan Palii
Oct 28, 2022
How to fix the issue when page has outgoing hreflang annotations using relative URLs
Site Audit Issues
How to fix the issue when page has outgoing hreflang annotations using relative URLs
Ivan Palii
Oct 28, 2022
What If DOCTYPE Is Not Declared in HTML
Site Audit Issues
What If DOCTYPE Is Not Declared in HTML
Ivan Palii
Sep 14, 2023