URL规范化器
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了URL规范化器相关的知识,希望对你有一定的参考价值。
# A spider middleware to canonicalize the urls of all requests generated from a spider. from scrapy.http import Request from scrapy.utils.url import canonicalize_url class UrlCanonicalizerMiddleware(object): def process_spider_output(self, response, result, spider): for r in result: if isinstance(r, Request): curl = canonicalize_url(r.url) if curl != r.url: r = r.replace(url=curl) yield r # Snippet imported from snippets.scrapy.org (which no longer works) # author: pablo # date : Sep 07, 2010
以上是关于URL规范化器的主要内容,如果未能解决你的问题,请参考以下文章