scrapy_redis 相关

Posted my8100

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了scrapy_redis 相关相关的知识,希望对你有一定的参考价值。

0.参考资料

https://redis.io/topics/data-types-intro  An introduction to Redis data types and abstractions

http://redisdoc.com/  Redis 命令参考

1.scrapy_redis

 

2.redis-cli 查看数据

2.1 匹配数据库内所有 key

redis-cli

127.0.0.1:6379> KEYS *
1) "mycrawler_redis:dupefilter"
2) "mycrawler_redis:requests"
6) "mycrawler_redis:items"

2.2 List(列表)

127.0.0.1:6379> type mycrawler_redis:items
list
127.0.0.1:6379> llen mycrawler_redis:items
(integer) 701
127.0.0.1:6379> LRANGE mycrawler_redis:items 0 1
1) "{\"text\": \"\\u201cA woman is like a tea bag; you never know how strong it is until it‘s in hot water.\\u201d\", \"crawled\": \"2018-02-21 03:38:17\", \"spider\": \"mycrawler_redis\", \"author\": \"Eleanor Roosevelt\"}"
2) "{\"text\": \"\\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\\u201d\", \"crawled\": \"2018-02-21 03:38:17\", \"spider\": \"mycrawler_redis\", \"author\": \"Albert Einstein\"}"
127.0.0.1:6379> LRANGE mycrawler_redis:items -2 -1
1) "{\"text\": \"\\u201cThe opposite of love is not hate, it‘s indifference. The opposite of art is not ugliness, it‘s indifference. The opposite of faith is not heresy, it‘s indifference. And the opposite of life is not death, it‘s indifference.\\u201d\", \"crawled\": \"2018-02-21 03:43:34\", \"spider\": \"mycrawler_redis\", \"author\": \"Elie Wiesel\"}"
2) "{\"text\": \"\\u201cIt is not a lack of love, but a lack of friendship that makes unhappy marriages.\\u201d\", \"crawled\": \"2018-02-21 03:43:34\", \"spider\": \"mycrawler_redis\", \"author\": \"Friedrich Nietzsche\"}"

 

2.3 Set(集合)

PS: size是容量,但cardinality是「基数」,是集合论中的术语

127.0.0.1:6379> type mycrawler_redis:dupefilter
set
127.0.0.1:6379> SCARD mycrawler_redis:dupefilter
(integer) 18603
127.0.0.1:6379> SRANDMEMBER mycrawler_redis:dupefilter
"5faa874e145528c84d636d5a95959583301e18f2"
127.0.0.1:6379> SRANDMEMBER mycrawler_redis:dupefilter
"68f9f6842efcd0392236b953ba6cf5c4616d4c91"

 

2.4 SortedSet(有序集合)

127.0.0.1:6379> type mycrawler_redis:requests
zset
127.0.0.1:6379> ZLEXCOUNT mycrawler_redis:requests - +
(integer) 18199
127.0.0.1:6379> ZRANGE mycrawler_redis:requests 0 1 WITHSCORES
1) "\x80\x02}q\x01(U\x04bodyq\x02U\x00U\t_encodingq\x03U\x05utf-8q\x04U\acookiesq\x05}q\x06U\x04metaq\a}q\b(U\x05depthq\tK\x02U\tlink_textq\nclxml.etree\n_ElementStringResult\nq\x0bU\x0cspiritualityq\x0c\x85\x81q\r}q\x0e(U\a_parentq\x0fNU\x0cis_attributeq\x10\x89U\battrnameq\x11NU\ais_textq\x12\x89U\ais_tailq\x13\x89ubU\x04ruleq\x14K\x00uU\aheadersq\x15}q\x16U\aRefererq\x17]q\x18U https://www.goodreads.com/quotesq\x19asU\x03urlq\x1aX1\x00\x00\x00https://www.goodreads.com/quotes/tag/spiritualityU\x0bdont_filterq\x1b\x89U\bpriorityq\x1cK\x00U\bcallbackq\x1dU\x14_response_downloadedq\x1eU\x05flagsq\x1f]q U\x06methodq!U\x03GETq\"U\aerrbackq#Nu."
2) "0"
3) "\x80\x02}q\x01(U\x04bodyq\x02U\x00U\t_encodingq\x03U\x05utf-8q\x04U\acookiesq\x05}q\x06U\x04metaq\a}q\b(U\x05depthq\tK\x02U\tlink_textq\nclxml.etree\n_ElementStringResult\nq\x0bU\rChoice Awardsq\x0c\x85\x81q\r}q\x0e(U\a_parentq\x0fNU\x0cis_attributeq\x10\x89U\battrnameq\x11NU\ais_textq\x12\x89U\ais_tailq\x13\x89ubU\x04ruleq\x14K\x00uU\aheadersq\x15}q\x16U\aRefererq\x17]q\x18U https://www.goodreads.com/quotesq\x19asU\x03urlq\x1aX&\x00\x00\x00https://www.goodreads.com/choiceawardsU\x0bdont_filterq\x1b\x89U\bpriorityq\x1cK\x00U\bcallbackq\x1dU\x14_response_downloadedq\x1eU\x05flagsq\x1f]q U\x06methodq!U\x03GETq\"U\aerrbackq#Nu."
4) "0"
127.0.0.1:6379> ZRANGE mycrawler_redis:requests -2 -1 WITHSCORES
1) "\x80\x02}q\x01(U\x04bodyq\x02U\x00U\t_encodingq\x03U\x05utf-8q\x04U\acookiesq\x05}q\x06U\x04metaq\a}q\b(U\tlink_textq\tX\x00\x00\x00\x00U\x04ruleq\nK\x00U\x10download_timeoutq\[email protected]\x80\x00\x00\x00\x00\x00U\x05depthq\x0cK\x02U\x0bretry_timesq\rK\x01U\rdownload_slotq\x0eU\x0fwww.youtube.comq\x0fuU\aheadersq\x10}q\x11(U\x0fAccept-Languageq\x12]q\x13U\x02enq\x14aU\aRefererq\x15]q\x16U\x17https://scrapinghub.comq\x17aU\x0fAccept-Encodingq\x18]q\x19U\x0cgzip,deflateq\x1aaU\x06Acceptq\x1b]q\x1cU?text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8q\x1daU\nUser-Agentq\x1e]q\x1fU7scrapy-redis (+https://github.com/rolando/scrapy-redis)q auU\x03urlq!X#\x00\x00\x00https://www.youtube.com/scrapinghubU\x0bdont_filterq\"\x88U\bpriorityq#J\xff\xff\xff\xffU\bcallbackq$U\x14_response_downloadedq%U\x05flagsq&]q‘U\x06methodq(U\x03GETq)U\aerrbackq*Nu."
2) "1"
3) "\x80\x02}q\x01(U\x04bodyq\x02U\x00U\t_encodingq\x03U\x05utf-8q\x04U\acookiesq\x05}q\x06U\x04metaq\a}q\b(U\tlink_textq\tX\x00\x00\x00\x00U\x04ruleq\nK\x00U\x10download_timeoutq\[email protected]\x80\x00\x00\x00\x00\x00U\x05depthq\x0cK\x02U\x0bretry_timesq\rK\x01U\rdownload_slotq\x0eU\x10www.facebook.comq\x0fuU\aheadersq\x10}q\x11(U\x0fAccept-Languageq\x12]q\x13U\x02enq\x14aU\aRefererq\x15]q\x16U\x17https://scrapinghub.comq\x17aU\x0fAccept-Encodingq\x18]q\x19U\x0cgzip,deflateq\x1aaU\x06Acceptq\x1b]q\x1cU?text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8q\x1daU\nUser-Agentq\x1e]q\x1fU7scrapy-redis (+https://github.com/rolando/scrapy-redis)q auU\x03urlq!X%\x00\x00\x00https://www.facebook.com/ScrapingHub/U\x0bdont_filterq\"\x88U\bpriorityq#J\xff\xff\xff\xffU\bcallbackq$U\x14_response_downloadedq%U\x05flagsq&]q‘U\x06methodq(U\x03GETq)U\aerrbackq*Nu."
4) "1"

 

3.

 





























以上是关于scrapy_redis 相关的主要内容,如果未能解决你的问题,请参考以下文章

scrapy-redis分布式爬虫

爬虫进阶Scrapy_redis概念作用和流程(分布式爬虫)

scrapy_redis实现爬虫

scrapy_redis配置文件

分布式爬虫Scrapy_redis原理分析并实现断点续爬

scrapy_redis种子优化