All the painstaking site optimization work can be crossed out by technical mistakes that will interfere with:
- The project’s ability to be fully indexed.
- The search robot’s ability to understand the structure of the site.
- Users’ ability to quickly find the information they need.
- And the algorithm’s ability to correctly rank the documents.
This review article is aimed at combating numerous technical mistakes.
Basic requirements for sites of all types
The main requirements for sites of all types and sizes include the following recommendations, sorted in order of importance and relative frequency online:
1. Configuration of a 301-redirect from the non-primary site mirror with “www” (or without “www”) to the main one. You can check the server response code of any document on the network using our on-page SEO checker.
2. Installation of 301-redirects from other non-primary site mirrors to the primary one (for example, from the domain name “www.stchck.pro” to “www.sitechecker.pro”).
3. Creation and configuration of a robots.txt file. It is recommended to divide the rules for Google and other search engines due to processing of various directives (for example, only Yandex supports the “Host:” directive, while it is incorrect for Google).
4. Each page on the site has a unique Title tag and a Description meta-tag reflecting its contents.
Checking the uniqueness of the Title tag and the Description meta-tag can be automated with the help of our website audit tool. Here we are not touching upon the questions of how to fill out these document fields correctly, but only consider the technical aspect.
5. Configuration of friendly URLs for the promoted website pages. It’s optimal to configure friendly URLs for all pages of the site. Quality control of the created URLs: we provide only the URL address to a colleague and ask him to describe what the page is devoted to. So you can try our URL checker to find out is friendly enough for Google.
6. Create and verify the correctness of the 404 error operation. The response code for this page should be “404 Not Found”. The 404 error should be displayed for erroneous URLs in all website sections. The page should be designed similarly to the rest of the site. It also should not redirect once on a non-existent URL and help the user quickly find the desired page on the site (display basic links, search form).
7. Check the server response codes for all pages on the site by scanning the project. All pages accessible via links should give the “200 OK” response code. Accordingly, if the page address is changed and the 301-redirect is installed, it is also necessary to correct the URL for internal link structure on the site so that they immediately lead to the final objective.
8. Check the site load time and page size in KB. Recommended metrics: up to 0.7 sec (700 ms) for the source document code download time, up to 0.2 sec (200 ms) for server response time and up to 120 KB for source code size.
9. Check the uniqueness of the main text header tag h1 and its availability on all pages. The content of the tag should reflect the essence of the text.
10. Make sure that h1-h6 tags are not used as site design elements.
11. Check the server’s uptime on statistics reports. Normal indicator value: 99.85% and above.
12. Create a unique and attention-grabbing favicon.ico and upload it to the root of the site.
13. You need to hide links (via AJAX) to the pages blocked from indexing in the robots.txt file to correctly distribute the static weight within the site. So that in the source code of the documents there were no fragments like “a href =” … ” for links to the page data. The script itself also needs to be blocked from indexing.
14. It is required to move large fragments of JS and CSS into separate connected files of the appropriate type. Delete temporary comments. This will speed up the jump and interpretation of the code for spiders and browsers. The “large” fragments include JS and CSS fragments of 8-10 rows and larger, as well as comments of more than 3-5 lines.
15. Check the coding for the absence of unclosed paired tags. This is the minimum requirement for code validity (if the line of the table “tr” is opened, it should be closed with “/ tr”, etc.).
16. Ensure proper display of the site’s main pages in all popular browsers. Particular attention should be paid (in order of the browser share): Google Chrome, Android Browser, Mobile Safari, Firefox, Yandex.Browser, Opera.
17. Configuration of 301 redirects from pages like “index.php”, “index.html”, “default.html” to pages without them (to the root of the folder, say, “/dir/index.php” to “/ dir / “).
18. Configuration of a 301 redirect from pages without a slash (“/”) at the end to pages with a slash (“/”) at the end of the URL (or, conversely, depending on the CMS and server settings).
19. Configuration of redirect http to https. Make sure you set up a 301-redirect from the “https” version to the main version of the site with “http” (or vice versa) in case of its availability and duplication. At the moment, search engines are checking the availability for indexing versions with “https” by default, which can lead to duplicate content on different hosts.
20. Blocking login pages from indexing in the CMS-system of the type”/ bitrix”, “/ login”, “/ admin”, “/ administrator”, “/ wp-admin” in the robots.txt file.
The Disallow directive of the following type can help:
Disallow: / bitrix
Disallow: / login
Disallow: / admin
Disallow: / administrator
Disallow: / wp-admin
21. Create a sitemap.xml website map with the specification of all site pages and checking it for validity. If the number of pages exceeds 50,000, you will need to create several maps. We recommend to download the map directly in the Google’s and Yandex Webmaster’s panels and not to specify it in the robots.txt file.
22. We recommend opening in a new tab with the help of target = “_ blank” for all external links. If you need to prevent the transfer of static weight for a number of links, then hide them from the scanning robot using AJAX.
23. Open and check the correctness of the saved copy of a number of key site pages. Pay attention to the encoding, the date of the saved copy, the completeness of the code.
24. Work folders “cgi-bin”, “wp-icnludes”, “cache”, “backup” in the robots.txt file need to be blocked from indexing.
25. Non-informative files (like * .swf) or empty * .doc and * .pdf files need to be blocked from indexing in the robots.txt file. If * .doc and * .pdf files are useful and carry valuable information – they should not be blocked from indexing.
Additional requirements: for online stores, sites with search, and authorization
For projects that are more complex from a technical point of view, eg. projects with authorization, internal search for types of goods, etc., there are a number of additional requirements for correct indexing:
26. It is recommended to set rel=”canonical” to eliminate duplicate pages and correctly account for all behavioral and referential document metrics. This recommendation is justified for small/simple sites, but because of certain difficulties in its implementation, it often remains merely a recommendation.
27. Pages of various sortings and filtering need to be blocked from indexing if the optimization and the CNC are not configured to drive traffic on low-frequency requests. The links themselves need to be hidden using AJAX.
28. Authorization, password change, order processing pages, etc. need to be blocked from indexing in the robots.txt file: “? basket & step =”, “register =”, “change_password =”, “logout =”.
29. The “search” search results need to be blocked from indexing in the robots.txt file.
30. The print version of the following type: “_print”, “version = print” and similar, needs to be blocked from indexing in the robots.txt file.
31. The action pages of the following type: “? Action = ADD2BASKET”, “? Action = BUY” should be blocked from indexing in the robots.txt file.
32. Sections with duplicate content need to be blocked from indexing in the robots.txt file: “feed”, “rss”, “wp-feed”.
A. As an alternative to the robots.txt file, you can use the meta tag name=”robots” with the field values “content=”noindex, follow””. This instruction is observed more accurately by the search engines, but requires a slightly more complex configuration.
B. A correctly cofigured rel=”canonical” attribute for the tag helps:
- Substantially simplify the site indexing configuration.
- Correctly consider and link the behavioral and other metrics on duplicate pages, say, pages with UTM codes. Especially when conducting advertising campaigns.