If you want every page of your website to be crawled and indexed correctly, keep your sitemap free of errors. A clean sitemap has to include only URLs returning 200 (OK) status codes. When a search engine encounters a page with a 404 Not Found error, it may simply ignore this page in the future.
What Does “Sitemap Error 4xx” Mean?
4xx URLs in a sitemap mislead search engines, making them index addresses that don’t exist or are unavailable. 4xx error codes indicate the problem is coming from the client’s side. The system requires authentication credentials, or there is an issue with a browser and IP address. There are 29 types of 4xx HTTP status codes, but the most common are 401 (Unauthorized), 403 (Forbidden), 404 (Not Found), and 410 (Gone).
What Triggers This Issue?
A 401 error code means that a page requires user authentication. It happens when one needs to log in to access the source or if an IP address is banned by the website.
- The browser’s cookies and cache are out of date.
- There was a mistake in the URL, or the link is invalid.
- The website has interpreted your security plugins as a threat and is trying to protect the page.
A 403 HTTP status code also means that a user is not allowed to access this page, although the authorization here won’t help.
- The URL has been mistyped.
- There are incorrect settings in the .htaccess file after some updates.
- The website’s owner has forbidden you to access the page by setting up permissions, or it happened by mistake.
- There is no index page.
- You might have made a mistake while uploading website content, and now the directory on your server is empty.
The famous 404 Not Found error shows you that the server cannot find this page.
- The URL is wrong.
- The page or the page’s directory was deleted or moved, and the link is invalid.
- The website’s owner has moved the page and forgot to add 301 redirects.
- For some privacy or security reasons, the owner has set the 404 status code.
A 410 error code indicates that the page doesn’t exist anymore on this server. With a 404 error, the requested page may still be available later. And the 410 status code means that the page has been moved or deleted permanently.
How To Check the Issue
To get rid of any 4xx error codes, you need to find the source of the issue. One of the following actions will help you locate the problem:
- If the sitemap cannot be accessed because of a 404 error, you can open it in an admin console. The status column will show you the issue. Open a list with URLs and find the one responding with an error.
- Check your server logs. Open a log file and look for any 4xx errors.
- Have a look at the configuration file: it’s either a .htaccess or nginx.conf file, depending on which server you’re using. The file may contain some unwanted redirects.
Detect not only 4xx page being in sitemap but also other kind of technical issue on your site!
Crawl your site and find out all kind of issues that can hart your users or your website SEO.
Why Is This Important?
Once a search engine finds a URL on your live site that responds with a 4xx error, it can start ignoring other pages. That is why it’s important to fix the problem as soon as possible and always keep the XML sitemap up-to-date. If a crawler loses trust in your sitemap, you risk losing the website indexation and high rankings.
This article can help you to learn more about the negative impact of 404 errors on your crawl budget and how Googlebot might react to it.
How To Fix the Issue
Before attempting to fix anything, make sure you’ve got a backup of your website’s database to avoid any additional issues. After that, open your sitemap’s settings using a dashboard of your CMS. Check if everything is fine with your custom plugins and extensions. One of the options is to deactivate them and check if that helped.
If you’re using WordPress, you may follow this tutorial showing a quick way to fix a 404 error.
Here are other methods to fix the 4xx issue:
- Clear the browser’s cache and cookies. That is how you will get rid of invalid data and update information.
- Check if the URL is correct and the link is valid. Sometimes, even a tiny error can lead to the 401 code.
- Clear the DNS (Domain Name Server) cache.
- With a 401 error, you can define the type of authentication that the website requires. You need to open the developer’s console in the browser, click on “Inspect,” and then go to “Network,” where you will find a list of resources. In the “Status” column, you will see which one shows an error. That is the one you need to check for the WWW-Authenticate header. It will give you a better idea of the problem.
- The website’s homepage should be called index.html or index.php. If there is no index page, you should rename it.
- Check the pathway to a page that has been recently moved. You can do it using the FTP service or CMS. The page may be in the wrong folder, so you should relocate it.
If nothing works, you should ask for help from your website host.