Intro: This is a problem that has existed for many years, often appears, but has never had a standard solution: search engine crawlers (especially Baidu) grab JS, CSS, JSON files, and robots masking is still
This is a problem that has existed for many years, often appears, but has never had a standard solution: search engine crawlers (especially Baidu) grab JS, CSS, JSON files, and robots masking is still crawling.
This raises several questions:
1. What do reptiles do to capture JS and CSS? 2. Can reptiles execute JS? 3. What effect does crawler capture JS have on SEO?
In response to the above questions, I would like to express my views:
First, crawling CSS, is used to determine the importance of page elements and to ensure the integrity of snapshot display; crawling JS, is used to discover new links and determine whether there is cheating
Second, the JS, is executed but it is not sure whether all JS will be executed. As many people on the Internet say, “the search engine will directly ignore JS, iframe and so on, only grab plain text information.” this does not hold water from the actual situation, ah, if the search engine is not a bird to JS, iframe birds, then some of the students who make black hats are not happy to die (I don’t know why? Please read the first two articles on black hats and you will understand! )
Third, I don’t know about this. In some cases, crawling quotas may be occupied, but there are few stations I’ve been through where spiders grab JS, and there’s nothing unusual about traffic.
Speaking of this, my current work stood in this situation in the first half of the year, Baidu crazy grasp json,robots shield all kinds of invalid, however, the flow has not dropped and other abnormal conditions, according to my psychological bearing ability will not care about this situation at all (/) /, but a check of the json grasping ratio really makes my chrysanthemum tight, close to 40%, yes, you are not wrong, 40%. Suppose Baidu grabs 1 million pages a day, and 400000 is json.
Then found that the total amount of Baidu crawling in the log and Baidu webmaster tool crawling frequency does not match, several checks found that the total amount of crawling in the log = the crawling frequency of Baidu tools + the total amount of json crawling in the log. That is to say, for Baidu to give the capture frequency data, the part of grasping json is not counted, which is equivalent to the capture of the attached gift. From this point of view, there should be no impact on SEO, there is no problem of occupying capture quotas, but look at the capture ratio is always very painful, or decide to solve this situation.
After checking, it is found that some pages contain a function: when the page is requested, it first determines whether the visiting user is logged in, if logged in, returns other products that the user has visited historically, and if not logged in, returns the specified content. The returned content is converted into a json file (yes, the one Baidu grabs crazy), and then the js,js passed to the front desk displays the parsed json data in the front-end interface by parsing the json file.
Using asynchronous loading, from a business logic point of view, if you don’t execute this js, for any visitor to the page, it’s equivalent to an unloaded page.
The json path is clearly written in js, and I don’t know whether Baidu identifies the path of json or executes js. Anyway, as long as it grabs the page that contains this function, it will catch the corresponding json file.
To sum up, there are two predetermined solutions: the first is to delete the JS corresponding to this feature directly, and the second is to face search engine access and not return to the js. So if the spider can’t see it at all, it won’t catch it.
Finally, because this function has been online for many months, but the data performance has been poor, the click rate is low, directly cut this function. Then I was looking at the log the next day, and the amount of json crawling was 0. 5%.
Wechat official account: traffic traffickers
Knowledge Planet (there will be welfare in the future, such as a piece of Python code that can write yellow paragraphs)