Intro: It is important for Google SEO to understand the basic knowledge of Google search engine, and we focus on the crawling, index and ranking of Google search engine. Search engine is a response machine,
It is important for Google SEO to understand the basic knowledge of Google search engine, and we focus on the crawling, index and ranking of Google search engine. Search engine is a response machine, its existence is to discover, understand and organize the content on the Internet, in order to provide the most relevant results for the questions posed by the seeker.
In order to display in the search results, our content should first be visible to the search engine. This is arguably the most important part of SEO: if our site is not found, the site will not be displayed in SERP (search engine results page).
Google search engine has three main functions:
Grab: search the Internet for content and see the code / content for each URL.
Index: stores and organizes content found during crawling. Once the page is in the index, it appears in the search as the result of the related query.
Ranking: provides the content that can best answer the search person’s query. Sort by the search results that are most helpful to a particular query.
Crawling is the process by which a search engine sends a set of robots (called crawlers or spiders) to discover new content or content. The content may be different, it may be web pages, pictures, videos, PDF, etc., but regardless of the format, the content is found through links.
The robot first gets several web pages, and then follows the links on these pages to find the new URL. By following these links, spiders can find new content and add it to the index to display it in the search results.
Search engines process and store the information they find in crawling, and indexes are huge databases that store content.
When people search for Google, search engines search their indexes for highly relevant content, and then sort the content in order to solve the problem of searchers’ queries. The order of search results by correlation is called ranking. Usually, we can assume that the higher the ranking of the site, the higher the relevance of the site to the query.
Note: not all search engines are the same in SEO
Many beginners are confused about the relative importance of a particular search engine. Most people know that Google has the largest market share, but how important is it to optimize Bing or other search engines? The truth is, although there are more than 30 search engines, they usually only do Google’s SEO. Because Google has the largest market share, it uses the most people. Plus Google Pictures, more than 90 per cent of Google Maps and YouTube, ‘s online searches take place at Google, nearly 20 times as much as Bing and other search engines.
As we have just learned, ensuring that the site is crawled and indexed is a prerequisite for display in SERP (search results). We can use “site:yourdomain.com” (an advanced search operator) to see which pages of our site are included.
Enter “site:yourdomain.com” in the Google search bar. We can see the collection of our own website.
The number of results displayed by Google is sometimes inaccurate, but it gives us a full understanding of the pages indexed on the site and how they are currently displayed in the search results.
To get more accurate results, we can view the index status in Google Search Console. If you do not currently have a Google account, you can register for a free Google Search Console account. With this tool, we can submit site maps for the site and monitor the optimal ranking of our site.
If the site is not displayed in the search results, there may be the following reasons:
1, the foreign trade English website is brand new, has not yet grasped.
2. The English website is not linked to any external website.
3. The architecture of the website makes it difficult for robots to grasp it effectively.
4. the site contains robots files to prevent search engines from crawling.
5. The site was punished by Google.
Search engines can crawl to find some of the pages of the site, but other pages may not be crawled for some reason. It is important to make sure that search engines can find everything we want to index, not just the home page of the site. If there is a crawling problem, we can solve it by the following points:
1, is the content of the site hidden behind the login form?
If we ask the user to log in (fill out the form or answer the survey) before accessing something, the search engine will not be able to see these protected pages.
2. Does the site rely on search forms?
Robots cannot use search forms. Some people think that if they put a search box on their website, the search engine can find all the content of the site, which is wrong.
3, is the text hidden in non-text content?
Our important text should not be placed in pictures or videos on the site. Although search engines are getting better and better in recognizing images, there is still no guarantee that they can read and understand images. It is best to add text to the tags of a Web page.
Just as crawlers need to find sites through links, links are still needed on our site to guide it between pages. If you have a page and you want the search engine to find it, the page should be linked to more pages. Many sites are building navigation in ways that search engines can’t access, hindering the site’s ranking in search results.
1. Mobile navigation displays results that are different from desktop navigation
3. Setting up specific site navigation for specific types of visitors may not be conducive to crawling by search engines.
4, forget to link to the home page on the website through navigation, please remember that the link is the crawling tool to follow the path of the new page!
This is why the site must have clear navigation and a useful URL folder structure.
A good website information architecture can improve the access efficiency of users and provide users with more intuitive content. The best information architecture should be intuitive, which means that users don’t have to bother to get through the site or find what they need.
When a visitor clicks a dead link or incorrectly enters a URL, the site should also have a 404 (page not found) page. The best 404 page allows users to click back to our site so that they don’t quit because they’re trying to access a link that doesn’t exist.
In addition to ensuring that crawlers have access to the most important pages, note that there are pages on the site that you don’t want them to find. These may include old URL, repetitive URL with streamlined content (such as sorting and filter parameters for e-commerce), special promotional code pages, login or test pages, and so on.
Blocking crawling pages in search engines can also help crawling tools to prioritize the most important pages and maximize crawling efficiency (the average number of pages crawled by search engine robots on the site).
By fetching tool instructions, you can use robots.txt files, meta-tags, sitemap.xml files, or Google Search Console to control what Googlebot wants to crawl and index.
The Robots.txt file is located in the root directory of the Web site, such as yourdomain.com/robots.txt, and it can tell the search engine which parts should not be crawled or crawled (not all search engines will comply with the Robots.txt file).
1, if Googlebot cannot find the robots.txt file of the Web site (40X HTTP status code), the site will continue to be crawled.
2. If Googlebot finds the robots.txt file for the site (20X HTTP status code), it usually follows these recommendations and continues to crawl the site.
3, if Googlebot neither finds a 20x or 40X HTTP status code (for example, a 501server error), it is impossible to determine whether the Web site has an robots.txt file and does not crawl the site.
Meta instructions are frequently used commands. It provides the crawling tool with detailed instructions on how to crawl and index the content of the URL.
If you want to block search engines on a large scale, meta-tags provide more flexibility because we can use regular expressions to block non-HTML files and apply site-wide noindex tags.
* the best practice for very sensitive URL, is to delete them or require a secure login to view the page.
WordPress template Web site tip: in dashboard & gt; settings & gt; reading, make sure the search engine visibility box is not selected. This will prevent search engines from accessing the site!
A site map is a list of URLs on a Web site that can be used by crawling tools to discover and index content. We can create site map files and submit them through Google Search Console. While submitting a site map is not a substitute for the navigation bar, it can certainly help crawlers track all important pages.
Some sites, the most common of which are e-commerce, provide the same content on multiple different sites by attaching certain parameters to the URL. If you have ever been shopping online, you may narrow the search through a filter. For example, search Amazon for “shoes,” and then optimize the search by size, color, and style. Each time you refine, the URL changes slightly. How does Google know which version of the URL is available to searchers? We can use the URL parameters feature in Google Search Console to tell Google what we want Google to do with web pages.
Once you make sure that the site is crawled, the next goal is to make sure it is indexed. Because a site is discovered and crawled by a search engine does not necessarily mean that it is indexed. After the crawler finds the page, the search engine will render it like a browser. In doing so, the search engine analyzes the content of the page. All of this information is stored in its index.
Can I see how the Googlebot crawler accesses my web page?
Yes, the cache version of the web page will reflect the last time googlebot took a snapshot of it.
Google grabs and caches web pages at different frequencies. Generally speaking, good websites grab more frequently than poor ones.
We can view the cache version of the page by clicking the drop-down arrow next to URL in SERP and selecting “Cached”:
Will the page be deleted from the index?
Yes, the page may be deleted from the index! The main reasons include:
1, the URL returns a “not found” error (4XX) or a server error (5XX)-this may be accidental (the page is moved and no redirection is set) or intentionally (the page is deleted)
2, the URL adds a noindex meta tag-the site owner can add this tag to instruct the search engine to omit the page from its index.
3. The site has been removed from the index because it has been manually penalized for violating the search engine’s website administrator’s guide.
4. the password has been added before the visitor accesses the page, so the spider has been prevented from crawling.
If a web page is not crawled and indexed, you can manually submit the URL to Google through the submit URL tool in Search Console.
How does Google search engine rank Web sites?
In order to determine the correlation, search engines use algorithms or formulas to retrieve and sort stored information in multiple dimensions. Over the years, these algorithms have undergone many changes to improve the quality of search results. For example, Google makes algorithm adjustments on a daily basis-some of which are minor quality adjustments, while others are core / extensive algorithm updates to solve specific problems, such as Google Penguin algorithms that solve spam problems.
Why do algorithms often change? Although Google does not disclose the details of the algorithm, Google’s ultimate goal in adjusting the algorithm is to improve the overall search quality. Therefore, if your site is affected after the algorithm is adjusted, please compare it with Google’s website quality guide or search quality evaluation guide to improve the website.
What website does the search engine want?
In the eyes of search engines, there is only one site that is the best: to provide useful answers to searches’ questions in the most useful way. If this is true, then why does it look like SEO is different now than it has been in the past few years?
Think about it from the point of view of someone learning a new language.
At first, their understanding of the language was very simple, and over time, their understanding began to deepen and began to learn semantics (the meaning behind the language and the relationship between words and phrases). Finally, through enough practice, they can understand the language well, even the nuances, and can provide answers to vague or incomplete questions.
Back to the search engine, when the search engine is just beginning to learn our language, some cheating techniques may easily fool the search engine. Take keyword filling as an example. If you want to rank a particular keyword, such as “hat wholesale”, you can add the words “hat wholesale” to the page many times, and then thicken it, which will generally have a good ranking. This strategy has created a bad user experience, which may have worked in the past, but now search engines can recognize that it is cheating.
The role of links in SEO
Web site links are generally divided into two types: internal links and reverse links. Backlinks or “inbound links” refer to other sites pointing to our site, while internal links refer to links to other pages on their own sites (on the same site).
Links play an important role in SEO. Originally, search engines mainly rely on the external chain to determine the ranking of websites, but now the factors of ranking have become diversified.
Backlinks are very similar to voting recommendations in real life. Let’s take a bread bakery as an example:
1, recommendations from others = good authority signs
Example: many different people have told you that bread is the best bakery in town
2, recommendations from yourself = biased and not authoritative
Jenny claims that Jenny’s bread is the best in town
3, recommendations from unrelated or low-quality sources = not a good authority sign and may even be marked as a garbage chain
Example: people who have never bought a bakery say bread is good.
4, no recommendation = unclear authority
For example: the bakery may be good, but you can’t find anyone who has a problem with it, so you’re not sure.
This is why PageRank was created. PageRank, part of Google’s core algorithm, is a link analysis algorithm named after Larry Page, one of Google’s founders. PageRank estimates the importance of a web page by measuring the quality and quantity of links to it. Assuming that the more relevant a web page is, the more important and trustworthy it is, the more links it will get. The more natural backlinks you get from a highly authoritative (trusted) site, the more likely the site is to rank higher in the search results.
The content of the website is very important.
Content is not just text; it is anything that is searched, such as video, images, and, of course, text. If the search engine is the answering machine, the content is the means by which the engine provides these answers.
Every time someone searches, there are thousands of possible results, so how do search engines determine to provide valuable pages to searchers? One of the important factors is the extent to which the content on the page matches the query intent. In other words, whether the page matches the words searched and completes the task that the searcher is trying to accomplish.
There is no strict benchmark on how long the content should be, the number of keywords should be included, or the content added to the title tag, we should focus on the user who will read the content.
Google has grown to date with hundreds or even thousands of ranking factors, but three have not changed: link to the site (as a third-party credibility signal), page content (high-quality content that meets the intentions of the searcher), and RankBrain.
What is RankBrain?
RankBrain is the machine learning component of Google core algorithm. Machine learning is a computer program that constantly improves its prediction through new observation and analysis data. In other words, it is always learning, and because it is always learning, so the search results continue to improve.
For example, if RankBrain notes that the lower ranking URL provides better results to the user than the higher ranking URL, RankBrain will adjust the search results, rank the more relevant results higher, and lower the lower related page ranking.
As Google will continue to use RankBrain to promote the most relevant and useful content, we need to focus on satisfying the intentions of the searchers. Provide the best information and experience for searchers who may log on to the web page.
Google staff revealed
Although they have never used the term “direct ranking signal,” Google has made it clear that they will definitely use click data to modify the SERP of a particular query.
According to Udi Manber, the former search quality director at Google:
“the ranking itself is affected by clickable data. If we find that for a particular query, 80% of people click # 2 and only 10% click # 1, after a while we find that # 2 may be what people want, so we’ll switch it. “
Another comment from Edmond Lau, a former Google engineer, confirms this:
“it is clear that any reasonable search engine will use click data to feed back into the ranking on its own search engine to improve the quality of search results. The actual mechanism for using click data is usually proprietary, but Google clearly says it uses systems such as click data and its patents, such as content items for ranking adjustment. “
Because Google needs to maintain and improve search quality, click rate indicators exist.
Various tests confirm that Google will adjust the ranking of the page according to the degree of participation of the searcher (click rate, etc.):
1. Rand Fiskin (Rand Fishkin) ‘s 2014 test results showed that after about 200 people clicked on SERP’s URL, the seventh result rose to number one. Interestingly, the improved ranking seems to have something to do with the location of the visitors. The ranking soared at Google in the United States, which is where the participants are, while on pages such as Google Canada and Google Australia, the ranking is still low.
2. Larry Kim’s comparison of the home page and its average residence time before and after RankBrain suggests that the machine learning component of the Google algorithm reduces the ranking position of pages that people don’t spend much time on.
3. Daren Shaw’s test also showed the impact of user behavior on local search and map package results.
Since user participation indicators are explicitly used to adjust the quality of SERP, SEO should pay more attention to page quality and users.