tag for Google

Posted 2021-04-08

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了tag for Google相关的知识，希望对你有一定的参考价值。

我想告诉Google不要索引页面的某些部分。在Yandex（俄罗斯SE），有一个非常有用的标签叫做<noindex>。如何才能与Google合作？

答案

您可以通过将这些部分放在被robots.txt阻止的iframe中来阻止Google查看部分网页。

的robots.txt

Disallow: /iframes/

的index.html

This text is crawlable, but now you'll see 
text that search engines can't see:
<iframe src="/iframes/hidden.html" width="100%" height=300 scrolling=no>

/iframes/hidden.HTML

Search engines cannot see this text.

您可以使用AJAX加载隐藏文件的内容，而不是使用iframe。以下是使用jquery ajax执行此操作的示例：

his text is crawlable, but now you'll see 
text that search engines can't see:
<div id="hidden"></div>
<script>
    $.get(
        "/iframes/hidden.html",
        function(data){$('#hidden').html(data)},
    );
</script>

另一答案

根据Wikipedia1，一些蜘蛛遵循一些规则：

<!--googleoff: all-->
This should not be indexed by Google. Though its main spider, Googlebot,
might ignore that hint.
<!--googleon: all-->

<div class="robots-nocontent">Yahoo bots won't index this.</div>

<noindex>Yandex bots ignore this text.</noindex>
<!--noindex-->They will ignore this, too.<!--/noindex-->

不幸的是，他们似乎无法就单一标准达成一致 - 据我所知，没有什么可以阻止所有蜘蛛脱落......

googleoff:评论似乎支持不同的选项，但我不确定哪里有完整的列表。至少有：

all：完全忽略该块
index：内容不会进入Google的索引
anchor：链接的锚文本不会与目标页面关联
片段：文本不会用于为搜索结果创建代码段

请注意（至少对谷歌而言）这只会影响搜索索引，而不会影响页面排名等。此外，正如Stephen Ostermiller在下面的评论中正确指出的那样，遗憾的是googleon和googleoff only work with the Google search appliance and have no effect on normal Googlebot。

还有一篇关于雅虎第二部分的文章（以及一篇描述Yandex也赞扬<noindex>6的文章）。在googleoff:部分，也看this answer，我从文章中获取了大部分相关信息。

此外，Google Webmaster Tools建议使用rel=nofollow属性4用于特定链接（例如广告或指向无法访问/对机器人有用的页面的链接，例如登录/注册）。这意味着，HTML a rel Attribute应该受到谷歌机器人的尊重 - 尽管这主要与网页排名有关，而与搜索索引本身无关。不幸的是，似乎没有rel=noindex5,7。我也不确定这个属性是否也可用于其他元素（例如<DIV REL="noindex">）;但除非爬行者尊重“noindex”，否则这也没有意义。

进一步参考：

How to Noindex parts of a web page?
Excluding crawler from sections of pages（Spiderline爬虫;你看，其他爬虫可能会使用其他专有标记（另请参阅AddSearch crawler）。我希望他们只是将REL="noindex"作为标准，而不是用于任何HTML标记，如DIV / SPAN / P / A！）
Preventing Google from indexing the contents of a div by reversing the string
Methods for preventing search engines from indexing irrelevant content on a page

1 Wikipedia: Noindex 2 Which Sections of Your Web Pages Might Search Engines Ignore? 3 Tell Google to Not Index Certain Parts of Your Page 4 Use rel="nofollow" for specific links 5 Is it a good idea to use <a href=“http://name.com” rel=“noindex, nofollow”>name</a>? 6 Using HTML tags — Yandex.Help. Webmaster 7 existing REL values

另一答案

不，Google does not support the <noindex> tag。事实上没有人这样做。

另一答案

在根级别创建robots.txt文件并插入如下内容：

阻止谷歌：

User-agent: Googlebot
Disallow: /myDisallowedDir1/
Disallow: /myDisallowedPage.html
Disallow: /myDisallowedDir2/

阻止所有机器人：

User-agent: *
Disallow: /myDisallowedDir1/
Disallow: /myDisallowedPage.html
Disallow: /myDisallowedDir2/

一个方便的robots.txt生成器：

http://www.mcanerin.com/EN/search-engine/robots-txt.asp

以上是关于tag for Google的主要内容，如果未能解决你的问题，请参考以下文章

RecyclerView holder中的Android Google Maps动态片段

片段中的 Xamarin Android Google 地图错误

tag for Google

js table td读取片段

Google 跟踪代码管理器集成安全性 - noscript iframe 沙盒

如何在 Toad for Oracle 中使用自定义代码片段？