Analysis of how to judge the similarity of website search engine

high repetition is not good, will be K, will drop right, will not be included, that linger in Shanghai Longfeng beginners mind, how a new data? You can update every day dozens of content? Apparently many people do not have this argument, Yu Shicai set the then, the pseudo original, common for the original way is a reversal of the contents of the article, synonymy, add or reduce the part, but found that after a long time, this is not included, what is the reason? Today the author detailed analysis, I hope this article can solve your questions.



page, look carefully will find some of the same title, describing the most is not the same, so the title and description of subtle changes are not the same and have no effect on the original is false, love can be identified in Shanghai,



this is love Shanghai news search cnzz 2 similar page source code, the similarity was greatly reduced to 45.332%, two pages that obviously cannot be judged, but love Shanghai but you can determine which 2 articles are similar.

summary: through the above observation, the ability to judge the search engine to strengthen, is no longer confined to the site of the source code, but you can directly find the Chinese part of the article, and compared to other sites, so even if you site is not the same, the layout of the page is not the same, as long as the content is collected, then the search engine can be judged the similarity of the article, but not the content of similar love >

We all know that the content of

then we can see the main part, the author found a tool that can detect similarity of two articles, we look at the similarity of part of the body:

These are similar to

Shanghai is not love not included the same content sites, like often write text will know, write soft Wen is to let others reproduced, is to increase the chain and related domains, apparently can be included, an obvious example: love Shanghai news search video

marked red, ha ha, we will, in the numerical above, the content is from the title to the end part, the similarity is 96.973%, the similarity is very high, it is clear that the article simply can be said is collected, but think carefully, the search engine spiders is accessed through the page, then the judge the similarity will have a relationship and source code? So I have two sites to copy the source code in the detection of similarity, please see below:


can be seen as the same news, which represent the same content can still be included, if you look carefully, you can click on the red circle where in

