Nginx 块 robots.txt 文件

Posted 2023-03-25

技术标签:

【中文标题】Nginx 块 robots.txt 文件【英文标题】：Nginx block robots.txt file 【发布时间】：2013-02-14 20:02:10 【问题描述】：

我在 Ubuntu 服务器 12.04 上运行 nginx 1.1.19，但在执行 Googlebot 时遇到问题，请参阅 robots.txt 文件。我使用了示例this post，但没有成功。为了测试该服务，我访问了网站管理员工具，点击“完整性 > 搜索为 Googlebot”...只是我收到来自“未找到”、“页面不可用”和“robots.txt 文件不可访问”的消息“……

我还要确认是否应该对文件nginx.conf 或/etc/nginx/sites-enabled 中的文件“default”执行配置，因为在以后的版本中，我注意到可能会有所不同。这是我的基本设置。

root /usr/share/nginx/www;
index index.php;

# Reescreve as URLs.
location / 
    try_files $uri $uri/ /index.php;

【问题讨论】：

【参考方案1】：

我设法通过添加命令“重写”策略服务器来解决我的问题，代码如下。在那之后，我回到谷歌网站管理员，用谷歌机器人重新搜索，它工作了。借此机会在这里留下我的代码，它将端口 80 重定向到 443 前缀和非 www 到 www。

# Redirect HTTP to HTTPS and NON-WWW to WWW
server 
    listen 80;
    server_name domain.com.br;
    rewrite ^ https://www.domain.com.br$1 permanent;

# Rewrite the URLs.
    location / 
    try_files $uri $uri/ /index.php;
    

server 
    listen 443;
    server_name www.domain.com.br;

# Rewrite the URLs.
    location / 
    try_files $uri $uri/ /index.php;


    root /usr/share/nginx/www;
    index index.php;

    [...] the code continued here

【讨论】：

【参考方案2】：

看看我的回答here。

关于将其添加到您的主 nginx.conf 文件还是您的 /etc/nginx/sites-available 文件，这取决于您，无论您希望它是全球性的还是特定于站点的。

【讨论】：

以上是关于Nginx 块 robots.txt 文件的主要内容，如果未能解决你的问题，请参考以下文章

Nginx：备用域的不同 robots.txt

转载robots.txt的学习

nginx下禁止访问robots.txt的设置方法

谷歌出台网页爬虫新标准!

2022 最新Robots.txt文件教程

nginx通过robots.txt禁止所有蜘蛛访问（禁止搜索引擎收录）