[SEO tutorial] # 1.3.2-Building Foundation-reptiles understand: reptiles are visible

Intro:   Reprinted from GoGo breaking into SEOTutorial address:Https://www.bilibili.com/video/av51705141 / www.bilibili.comContent introduction:1) sites with timely content are prone to machine invisibility. S

Reprinted from GoGo breaking into SEO

Tutorial address:

Https://www.bilibili.com/video/av51705141 / www.bilibili.com

Content introduction:

1) sites with timely content are prone to machine invisibility. Such as:

E-commerce website, the merchant offline a product that is no longer on sale

Group buying website, the merchant went offline for an activity that was no longer concessionary.

Recruitment website, the company offline a position that is no longer recruited

B2B website, the manufacturer offline a product that is no longer wholesale


When a user goes offline with a product / position / activity, the corresponding front-end page generally has three states:

A, after the product is offline, the corresponding page 404 status immediately

If the page corresponding to the offline product, just in the search engine to be crawled list, when the search engine access, there must be a machine invisible situation, because it is a dead chain. So SEO needs to find technology to get links to offline products on a regular basis (at least every day), submit dead chains to search engines in a timely manner, and avoid the risk of punishment.

B, after the product is offline, correspond to the page 301 to the home page, or the parent page, or something else.

C. After the product is offline, the corresponding page is still in the 200 state, and the identity of the offline state is added to the page.

Products that have been offline, such as an ecommerce website that is no longer on sale, are worthless for users, but the front end is still 200, allowing search engines to consume resources to grab, so it is not friendly to stand in the position of search engines.

In the specific way of processing, SEO should first understand the processing logic of the product for the offline product, and then decide according to the actual situation, such as:

A, if the historical flow is more than the price, all 404are obviously very SEO practices, whether there is a need for a 200s state of traffic, and whether there is no flow rate.

B. If the product is released by the user, and then offline, you can resume online later. Is the url released and restored the same as when it comes online? If the same how to deal with, different how to deal with?

In a word, according to the actual situation, we need to make a balance among user feelings, search engine friendliness and SEO traffic.

2) content that requires access to view

If you set access rights for both the user and the search engine, such as the need to log in before you can see the text, there is no doubt that the SEO has a great impact on the crawler, and the crawler is unable to perform the login behavior of the person.

Generally, part of the content can be disclosed, hidden part of the content can be processed, or the whole content can be disclosed for crawlers, and the content can be hidden for users.

3) trigger an anti-crawler policy and return to an empty or fake page

The operation and maintenance staff did not add the search engine to the whitelist list, the search engine was misjudged as the “bad guy”, triggered the anti-crawler policy, and returned to the blank page.

What is even worse is to return false data, such as random extraction from 100 pieces of prepared data to reptiles, crawlers catch tens of thousands of pages, in fact, that 100 pieces of data, which has a serious impact on SEO.

Thus, it is concluded that SEO needs to keep pace with operation and maintenance at all times and keep abreast of the latest anti-crawler measures, and whether there is the possibility of accidental damage to search engines.

4) POST request

Data returned through post requests often appears in multi-conditional filter boxes, such as KFC store content: http://www.kfc.com.cn/kfccda/storelist/index.aspx

After the user performs a behavior (click on a button, input string, and so on,), js gets the behavior, post requests the web service, to display the returned data in the currently open page, rather than in the new tab, so the url is unchanged.

Reptiles can’t simulate human behavior, and naturally they can’t see the data.


A, usually using the tripartite browser kernel, traversing the pages that need post to load, render the web pages after the browser, and generate static pages. Web service then determines that the visiting user, if it is a search engine, replaces the post button (href= “javascript:void (0);”) with a link to a static page (href= “{static page link}”)

B, if there are too many post pages, the technology is generally reluctant to do it in a way, because a module has to maintain two sets of code, troublesome. Therefore, we usually set up a new set of pages to carry the keyword traffic corresponding to this batch of post pages.

5) the server is slow to respond, resulting in incomplete content

The crawler grabs the page and waits for the data to return for a limited amount of time, with timeout. The website that can be accessed is not in place, often the crawler does not have time to grasp the whole content, the characteristic is that only part of the content of the page is left in the snapshot, the other part is not caught, at that time, the crawler grabs, timeout

6) iframe/ajax, does not explain, ajax see the post section above, the principle is similar

7) the crawler can’t understand the information contained in the picture and video by carrying the content with the picture / flash.

-& gt; base Q group:

Related Passages:

Leave a Reply

Your email address will not be published. Required fields are marked *