31当当图书榜单爬虫
Posted www1707
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了31当当图书榜单爬虫相关的知识,希望对你有一定的参考价值。
练习介绍
要求:
请使用Scrapy,爬取当当网2018年图书销售榜单前3页的数据(图书名、作者和书的价格)。
当当网2018年图书销售榜单链接:
目的:
练习定义item
练习编写spiders文件
练习修改settings文件
1、创建当当爬虫的项目
1 D:\\USERDATA\\python>scrapy startproject dangdang 2 New Scrapy project ‘dangdang‘, using template directory ‘c:\\users\\www1707\\appdata\\local\\programs\\python\\python37\\lib\\site-packages\\scrapy\\templates\\project‘, created in: 3 D:\\USERDATA\\python\\dangdang 4 5 You can start your first spider with: 6 cd dangdang 7 scrapy genspider example example.com 8 9 D:\\USERDATA\\python>
2、新建爬虫文件 D:\\USERDATA\\python\\dangdang\\dangdang\\spiders\\dangdang.py
1 import scrapy 2 import bs4 3 from ..items import DangdangItem 4 5 class DangdangSpider(scrapy.Spider): 6 name = ‘dangdang‘ 7 allowed_domains = [‘http://bang.dangdang.com‘] 8 start_urls = [] 9 for x in range(1,4): 10 url = ‘http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-‘ + str(x) 11 start_urls.append(url) 12 13 def parse(self,response): 14 bs = bs4.BeautifulSoup(response.text,‘html.parser‘) 15 datas = bs.find(‘ul‘,class_=‘bang_list_mode‘).find_all(‘li‘) 16 for data in datas: 17 item = DangdangItem() 18 item[‘bang_num‘] = data.find(‘div‘,class_=‘list_num‘).text 19 item[‘book_name‘] = data.find(‘div‘,class_=‘name‘).text 20 item[‘book_author‘] = data.find(‘div‘,class_=‘publisher_info‘).text 21 item[‘price‘] = data.find(‘span‘,class_=‘price_n‘).text 22 yield item
3、编辑 D:\\USERDATA\\python\\dangdang\\dangdang\\items.py
1 import scrapy 2 3 class DangdangItem(scrapy.Item): 4 bang_num = scrapy.Field() 5 book_name = scrapy.Field() 6 book_author = scrapy.Field() 7 price = scrapy.Field()
4、编辑 D:\\USERDATA\\python\\dangdang\\dangdang\\settings.py
1 BOT_NAME = ‘dangdang‘ 2 SPIDER_MODULES = [‘dangdang.spiders‘] 3 NEWSPIDER_MODULE = ‘dangdang.spiders‘ 4 USER_AGENT = ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36‘ 5 ROBOTSTXT_OBEY = True
5、在D:\\USERDATA\\python\\dangdang 下执行命令 scrapy crawl dangdang
1 D:\\USERDATA\\python\\dangdang>scrapy crawl dangdang 2 2019-05-08 17:00:28 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: dangdang) 3 2019-05-08 17:00:28 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0 4 2019-05-08 17:00:28 [scrapy.crawler] INFO: Overridden settings: {‘BOT_NAME‘: ‘dangdang‘, ‘NEWSPIDER_MODULE‘: ‘dangdang.spiders‘, ‘SPIDER_MODULES‘: [‘dangdang.spiders‘], ‘USER_AGENT‘: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36‘} 5 2019-05-08 17:00:28 [scrapy.extensions.telnet] INFO: Telnet Password: f05741387a33a05e 6 2019-05-08 17:00:28 [scrapy.middleware] INFO: Enabled extensions: 7 [‘scrapy.extensions.corestats.CoreStats‘, 8 ‘scrapy.extensions.telnet.TelnetConsole‘, 9 ‘scrapy.extensions.logstats.LogStats‘] 10 2019-05-08 17:00:28 [scrapy.middleware] INFO: Enabled downloader middlewares: 11 [‘scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware‘, 12 ‘scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware‘, 13 ‘scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware‘, 14 ‘scrapy.downloadermiddlewares.useragent.UserAgentMiddleware‘, 15 ‘scrapy.downloadermiddlewares.retry.RetryMiddleware‘, 16 ‘scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware‘, 17 ‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware‘, 18 ‘scrapy.downloadermiddlewares.redirect.RedirectMiddleware‘, 19 ‘scrapy.downloadermiddlewares.cookies.CookiesMiddleware‘, 20 ‘scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware‘, 21 ‘scrapy.downloadermiddlewares.stats.DownloaderStats‘] 22 2019-05-08 17:00:28 [scrapy.middleware] INFO: Enabled spider middlewares: 23 [‘scrapy.spidermiddlewares.httperror.HttpErrorMiddleware‘, 24 ‘scrapy.spidermiddlewares.offsite.OffsiteMiddleware‘, 25 ‘scrapy.spidermiddlewares.referer.RefererMiddleware‘, 26 ‘scrapy.spidermiddlewares.urllength.UrlLengthMiddleware‘, 27 ‘scrapy.spidermiddlewares.depth.DepthMiddleware‘] 28 2019-05-08 17:00:28 [scrapy.middleware] INFO: Enabled item pipelines: 29 [] 30 2019-05-08 17:00:28 [scrapy.core.engine] INFO: Spider opened 31 2019-05-08 17:00:28 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 32 2019-05-08 17:00:28 [py.warnings] WARNING: c:\\users\\www1707\\appdata\\local\\programs\\python\\python37\\lib\\site-packages\\scrapy\\spidermiddlewares\\offsite.py:61: URLWarning: allowed_domains accepts only domains, not URLs. Ignoring URL entry http://bang.dangdang.com in allowed_domains. 33 warnings.warn(message, URLWarning) 34 35 36 2019-05-08 17:00:28 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 37 2019-05-08 17:00:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> (referer: None) 38 2019-05-08 17:00:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> (referer: None) 39 2019-05-08 17:00:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> (referer: None) 40 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 41 {‘bang_num‘: ‘21.‘, 42 ‘book_author‘: ‘[日]东野圭吾 著,新经典 出品‘, 43 ‘book_name‘: ‘东野圭吾:白夜行(2017版,易烊千玺、韩雪推荐,东野圭吾无冕之...‘, 44 ‘price‘: ‘¥41.10‘} 45 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 46 {‘bang_num‘: ‘22.‘, 47 ‘book_author‘: ‘(法)安托万·德·圣埃克苏佩里 著,李继宏 译,果麦文化 出品‘, 48 ‘book_name‘: ‘小王子(畅销300万册,作者基金会官方认证简体中文版)【果麦经典...‘, 49 ‘price‘: ‘¥17.60‘} 50 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 51 {‘bang_num‘: ‘23.‘, 52 ‘book_author‘: ‘刘慈欣‘, 53 ‘book_name‘: ‘三体:全三册 刘慈欣代表作,亚洲首部“雨果奖”获奖作品!‘, 54 ‘price‘: ‘¥55.80‘} 55 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 56 {‘bang_num‘: ‘24.‘, 57 ‘book_author‘: ‘钱钟书\\u3000著‘, 58 ‘book_name‘: ‘围城‘, 59 ‘price‘: ‘¥24.90‘} 60 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 61 {‘bang_num‘: ‘25.‘, 62 ‘book_author‘: ‘黄仁宇‘, 63 ‘book_name‘: ‘万历十五年 一本好书 腾讯视频栏目推荐‘, 64 ‘price‘: ‘¥24.70‘} 65 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 66 {‘bang_num‘: ‘41.‘, 67 ‘book_author‘: ‘海明威 罗曼·罗兰 塞尔玛·拉格洛夫 等,张荣梅 策划,小当当童书馆 出品‘, 68 ‘book_name‘: ‘诺奖少年版(全套30册)2018当当童书畅销书,日销售ZUI高达50000...‘, 69 ‘price‘: ‘¥248.10‘} 70 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 71 {‘bang_num‘: ‘1.‘, 72 ‘book_author‘: ‘余华‘, 73 ‘book_name‘: ‘活着(2017年新版)‘, 74 ‘price‘: ‘¥28.00‘} 75 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 76 {‘bang_num‘: ‘26.‘, 77 ‘book_author‘: ‘(美)加·泽文 Gabrielle Zevin 著;孙仲旭、李玉瑶 译;读客文化 出品‘, 78 ‘book_name‘: ‘岛上书店(每个人的生命中,都有无比艰难的那一年,将人生变得美...‘, 79 ‘price‘: ‘¥29.80‘} 80 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 81 {‘bang_num‘: ‘27.‘, 82 ‘book_author‘: ‘贾平凹 著 时代华语 出品‘, 83 ‘book_name‘: ‘自在独行 贾平凹的独行世界‘, 84 ‘price‘: ‘¥28.00‘} 85 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 86 {‘bang_num‘: ‘28.‘, 87 ‘book_author‘: ‘姜自霞‘, 88 ‘book_name‘: ‘魔法拼音国(套装 共7册)‘, 89 ‘price‘: ‘¥49.00‘} 90 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 91 {‘bang_num‘: ‘29.‘, 92 ‘book_author‘: ‘张嘉佳 著,博集天卷 出品‘, 93 ‘book_name‘: ‘云边有个小卖部‘, 94 ‘price‘: ‘¥21.00‘} 95 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 96 {‘bang_num‘: ‘30.‘, 97 ‘book_author‘: ‘路遥 著,新经典 出品‘, 98 ‘book_name‘: ‘平凡的世界:全三册(朱一龙推荐,八年级下册自主阅读推荐)‘, 99 ‘price‘: ‘¥74.50‘} 100 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 101 {‘bang_num‘: ‘31.‘, 102 ‘book_author‘: ‘戴尔·卡耐基 著,陶曚 译,果麦文化 出品‘, 103 ‘book_name‘: ‘人性的弱点(薛之谦推荐,畅销100万册)【果麦经典】‘, 104 ‘price‘: ‘¥28.50‘} 105 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 106 {‘bang_num‘: ‘32.‘, 107 ‘book_author‘: ‘大冰 著,博集天卷 出品‘, 108 ‘book_name‘: ‘我不(大冰作品。十个月狂销200万册,不容错过的奇书!)‘, 109 ‘price‘: ‘¥21.50‘} 110 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 111 {‘bang_num‘: ‘33.‘, 112 ‘book_author‘: ‘〔英〕毛姆 著 苏福忠 译‘, 113 ‘book_name‘: ‘月亮和六便士(全新导读无删节详注版! 半年创当当110000名读者五...‘, 114 ‘price‘: ‘¥24.30‘} 115 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 116 {‘bang_num‘: ‘34.‘, 117 ‘book_author‘: ‘陈磊(二混子) 著;读客文化 出品‘, 118 ‘book_name‘: ‘半小时漫画中国史(修订版)(看半小时漫画,通五千年历史!《半...‘, 119 ‘price‘: ‘¥35.90‘} 120 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 121 {‘bang_num‘: ‘35.‘, 122 ‘book_author‘: ‘(美)怀特\\u3000著,任溶溶\\u3000译‘, 123 ‘book_name‘: ‘夏洛的网(新)‘, 124 ‘price‘: ‘¥19.50‘} 125 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 126 {‘bang_num‘: ‘36.‘, 127 ‘book_author‘: ‘克莱儿·麦克福尔,白马时光 出品‘, 128 ‘book_name‘: ‘摆渡人2:重返荒原(系列畅销千万册。每一个镌刻着爱与善意的灵魂...‘, 129 ‘price‘: ‘¥38.80‘} 130 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 131 {‘bang_num‘: ‘37.‘, 132 ‘book_author‘: ‘东野圭吾 著,娄美莲 译,新经典 出品‘, 133 ‘book_name‘: ‘东野圭吾:恶意(2016版,东野圭吾四大杰作之一)‘, 134 ‘price‘: ‘¥27.30‘} 135 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 136 {‘bang_num‘: ‘38.‘, 137 ‘book_author‘: ‘(哥伦)马尔克斯\\u3000 著,杨玲 译,新经典 出品‘, 138 ‘book_name‘: ‘霍乱时期的爱情(2015版) 一本好书 腾讯视频栏目推荐‘, 139 ‘price‘: ‘¥34.20‘} 140 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 141 {‘bang_num‘: ‘39.‘, 142 ‘book_author‘: ‘埃德加·斯诺 著;董乐山 译‘, 143 ‘book_name‘: ‘红星照耀中国(青少版)人民文学出版社‘, 144 ‘price‘: ‘¥20.50‘} 145 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-2> 146 {‘bang_num‘: ‘40.‘, 147 ‘book_author‘: ‘蔡崇达 著,果麦文化 出品‘, 148 ‘book_name‘: ‘皮囊(畅销300万册的国民读本,刘德华、李敬泽作序。繁体版面世即...‘, 149 ‘price‘: ‘¥21.80‘} 150 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 151 {‘bang_num‘: ‘42.‘, 152 ‘book_author‘: ‘(日)山下英子‘, 153 ‘book_name‘: ‘断舍离‘, 154 ‘price‘: ‘¥27.10‘} 155 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 156 {‘bang_num‘: ‘43.‘, ‘book_author‘: ‘杨绛‘, ‘book_name‘: ‘我们仨‘, ‘price‘: ‘¥23.00‘} 157 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 158 {‘bang_num‘: ‘44.‘, 159 ‘book_author‘: ‘老杨的猫头鹰‘, 160 ‘book_name‘: ‘好看的皮囊千篇一律,有趣的灵魂万里挑一(老杨的猫头鹰最新作品...‘, 161 ‘price‘: ‘¥31.00‘} 162 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 163 {‘bang_num‘: ‘45.‘, 164 ‘book_author‘: ‘(美)玛兹丽施\\u3000著,安燕玲\\u3000译‘, 165 ‘book_name‘: ‘如何说孩子才会听 怎么听孩子才肯说(全新修订版)‘, 166 ‘price‘: ‘¥36.80‘} 167 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 168 {‘bang_num‘: ‘46.‘, 169 ‘book_author‘: ‘王小波 著,新经典 出品‘, 170 ‘book_name‘: ‘一只特立独行的猪‘, 171 ‘price‘: ‘¥22.80‘} 172 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 173 {‘bang_num‘: ‘47.‘, 174 ‘book_author‘: ‘(美)莱曼·弗兰克·鲍姆,(德)格林兄弟,(丹)安徒生等著,张荣梅 策划,小当当童书馆 出品‘, 175 ‘book_name‘: ‘百年童话绘本·典藏版(全套30册)当当2018年度常青藤畅销书奖,...‘, 176 ‘price‘: ‘¥208.60‘} 177 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 178 {‘bang_num‘: ‘48.‘, 179 ‘book_author‘: ‘杨绛‘, 180 ‘book_name‘: ‘我们仨(新版)‘, 181 ‘price‘: ‘¥15.80‘} 182 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 183 {‘bang_num‘: ‘49.‘, 184 ‘book_author‘: ‘著 (日)莳田晋至,译 吴佳芬,绘 (日)长谷川知子‘, 185 ‘book_name‘: ‘在教室说错了没关系‘, 186 ‘price‘: ‘¥18.00‘} 187 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 188 {‘bang_num‘: ‘50.‘, 189 ‘book_author‘: ‘高春香,邵敏 著,许明振,李婧 绘‘, 190 ‘book_name‘: ‘这就是二十四节气(中国二十四节气彩绘版,文津图书奖获奖绘本,...‘, 191 ‘price‘: ‘¥50.00‘} 192 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 193 {‘bang_num‘: ‘51.‘, 194 ‘book_author‘: ‘慕颜歌 著,文通天下 出品‘, 195 ‘book_name‘: ‘你的善良必须有点锋芒‘, 196 ‘price‘: ‘¥29.20‘} 197 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 198 {‘bang_num‘: ‘52.‘, 199 ‘book_author‘: ‘余华‘, 200 ‘book_name‘: ‘许三观卖血记(新版)‘, 201 ‘price‘: ‘¥32.00‘} 202 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 203 {‘bang_num‘: ‘53.‘, 204 ‘book_author‘: ‘陈磊(笔名:二混子) 著;读客文化 出品‘, 205 ‘book_name‘: ‘半小时漫画世界史(看半小时漫画,通五千年历史!其实是一本严谨...‘, 206 ‘price‘: ‘¥35.90‘} 207 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 208 {‘bang_num‘: ‘54.‘, 209 ‘book_author‘: ‘高铭 著,磨铁图书 出品‘, 210 ‘book_name‘: ‘天才在左 疯子在右(2018全新完整版)‘, 211 ‘price‘: ‘¥44.10‘} 212 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 213 {‘bang_num‘: ‘55.‘, 214 ‘book_author‘: ‘[日]稻盛和夫 著,曹岫云 译‘, 215 ‘book_name‘: ‘阿米巴经营——畅销十周年纪念版,当当全国独家(团购,请致电40...‘, 216 ‘price‘: ‘¥27.30‘} 217 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 218 {‘bang_num‘: ‘56.‘, 219 ‘book_author‘: ‘东野圭吾 著,刘子倩 译,新经典 出品‘, 220 ‘book_name‘: ‘东野圭吾:嫌疑人X的献身(王凯、张鲁一推荐,至为纯粹的爱情,绝...‘, 221 ‘price‘: ‘¥26.30‘} 222 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 223 {‘bang_num‘: ‘57.‘, 224 ‘book_author‘: ‘曹文轩 著‘, 225 ‘book_name‘: ‘曹文轩文集典藏版(全7册)‘, 226 ‘price‘: ‘¥84.00‘} 227 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 228 {‘bang_num‘: ‘58.‘, 229 ‘book_author‘: ‘史蒂芬·霍金‘, 230 ‘book_name‘: ‘时间简史(插图本)(央视《朗读者》推荐)‘, 231 ‘price‘: ‘¥32.60‘} 232 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 233 {‘bang_num‘: ‘2.‘, 234 ‘book_author‘: ‘周国平‘, 235 ‘book_name‘: ‘我喜欢生命本来的样子(周国平经典散文作品集)‘, 236 ‘price‘: ‘¥40.50‘} 237 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 238 {‘bang_num‘: ‘3.‘, 239 ‘book_author‘: ‘乔安娜柯尔\\u3000著 布鲁斯迪根 图\\u3000施芳\\u3000译‘, 240 ‘book_name‘: ‘神奇校车·桥梁书版(全20册)‘, 241 ‘price‘: ‘¥75.00‘} 242 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 243 {‘bang_num‘: ‘4.‘, 244 ‘book_author‘: ‘郑利强 段虹(绘) 步印童书 出品‘, 245 ‘book_name‘: ‘我的第一本地理启蒙书‘, 246 ‘price‘: ‘¥24.90‘} 247 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 248 {‘bang_num‘: ‘59.‘, 249 ‘book_author‘: ‘〔英〕安东尼·布朗‘, 250 ‘book_name‘: ‘我爸爸+我妈妈(全2册)‘, 251 ‘price‘: ‘¥46.50‘} 252 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-3> 253 {‘bang_num‘: ‘60.‘, 254 ‘book_author‘: ‘陈卫平、陈雨岚等 步印童书 出品‘, 255 ‘book_name‘: ‘写给儿童的中国地理(全14册)‘, 256 ‘price‘: ‘¥196.00‘} 257 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 258 {‘bang_num‘: ‘5.‘, 259 ‘book_author‘: ‘(日)太宰治\\u3000著,杨伟\\u3000译‘, 260 ‘book_name‘: ‘人间失格(日本小说家太宰治的自传体小说)‘, 261 ‘price‘: ‘¥22.50‘} 262 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 263 {‘bang_num‘: ‘6.‘, 264 ‘book_author‘: ‘(荷)丹姆 著,漆仰平,爱桐 译‘, 265 ‘book_name‘: ‘小熊和最好的爸爸(全7册)‘, 266 ‘price‘: ‘¥17.50‘} 267 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 268 {‘bang_num‘: ‘7.‘, 269 ‘book_author‘: ‘戴维·伽特森‘, 270 ‘book_name‘: ‘雪落香杉树 (福克纳奖得主,全球畅销500万册)‘, 271 ‘price‘: ‘¥46.80‘} 272 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 273 {‘bang_num‘: ‘8.‘, 274 ‘book_author‘: ‘张嘉骅‘, 275 ‘book_name‘: ‘少年读史记(套装全5册)‘, 276 ‘price‘: ‘¥50.00‘} 277 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 278 {‘bang_num‘: ‘9.‘, 279 ‘book_author‘: ‘(美)乔安娜柯尔 著 ,(美)布鲁斯·迪根 图‘, 280 ‘book_name‘: ‘神奇校车·图画书版(全12册,新增《科学博览会》1册)‘, 281 ‘price‘: ‘¥99.00‘} 282 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 283 {‘bang_num‘: ‘10.‘, 284 ‘book_author‘: ‘大冰 著,博集天卷 出品‘, 285 ‘book_name‘: ‘你坏(大冰2018作品!预售10分钟8.6万册+,30分钟突破11.8万册,...‘, 286 ‘price‘: ‘¥27.30‘} 287 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 288 {‘bang_num‘: ‘11.‘, 289 ‘book_author‘: ‘(日)东野圭吾 著,新经典 出品‘, 290 ‘book_name‘: ‘东野圭吾:解忧杂货店(王俊凯、迪丽热巴主演,这家店帮你找回内...‘, 291 ‘price‘: ‘¥27.30‘} 292 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 293 {‘bang_num‘: ‘12.‘, 294 ‘book_author‘: ‘沈复 著 , 张佳玮 译,果麦文化 出品‘, 295 ‘book_name‘: ‘浮生六记(汪涵、胡歌推荐,畅销250万册。沈复给芸娘的绝美情书)...‘, 296 ‘price‘: ‘¥15.20‘} 297 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 298 {‘bang_num‘: ‘13.‘, 299 ‘book_author‘: ‘[美] 简·尼尔森‘, 300 ‘book_name‘: ‘《正面管教》修订版‘, 301 ‘price‘: ‘¥18.90‘} 302 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 303 {‘bang_num‘: ‘14.‘, 304 ‘book_author‘: ‘陈卫平著 步印童书 出品‘, 305 ‘book_name‘: ‘写给儿童的中国历史(全14册)‘, 306 ‘price‘: ‘¥177.50‘} 307 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 308 {‘bang_num‘: ‘15.‘, 309 ‘book_author‘: ‘毛姆 著,徐淳刚 译,大星文化 出品,作家榜经典文库,高更 绘‘, 310 ‘book_name‘: ‘月亮与六便士(新版未删节!当当名著销量桂冠!豆瓣阅读桂冠!上...‘, 311 ‘price‘: ‘¥29.90‘} 312 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 313 {‘bang_num‘: ‘16.‘, 314 ‘book_author‘: ‘[英]克莱儿·麦克福尔,白马时光 出品‘, 315 ‘book_name‘: ‘摆渡人(系列畅销千万册。如果命运是一条孤独的河流,谁会是你灵...‘, 316 ‘price‘: ‘¥32.60‘} 317 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 318 {‘bang_num‘: ‘17.‘, 319 ‘book_author‘: ‘加西亚·马尔克斯 著,新经典 出品‘, 320 ‘book_name‘: ‘马尔克斯:百年孤独(50周年纪念版)‘, 321 ‘price‘: ‘¥41.30‘} 322 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 323 {‘bang_num‘: ‘18.‘, 324 ‘book_author‘: ‘[美]卡勒德·胡赛尼(Khaled Hosseini) 著,李继宏 译‘, 325 ‘book_name‘: ‘追风筝的人(2018年新版)‘, 326 ‘price‘: ‘¥18.00‘} 327 2019-05-08 17:00:28 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 328 {‘bang_num‘: ‘19.‘, 329 ‘book_author‘: ‘佐佐木圭一 著 程亮 译 时代华语 出品‘, 330 ‘book_name‘: ‘所谓情商高,就是会说话‘, 331 ‘price‘: ‘¥23.00‘} 332 2019-05-08 17:00:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-1> 333 {‘bang_num‘: ‘20.‘, 334 ‘book_author‘: ‘李思圆 著,文通天下 出品‘, 335 ‘book_name‘: ‘生活需要仪式感 (把温暖和感动带给你在乎的人)‘, 336 ‘price‘: ‘¥32.10‘} 337 2019-05-08 17:00:29 [scrapy.core.engine] INFO: Closing spider (finished) 338 2019-05-08 17:00:29 [scrapy.statscollectors] INFO: Dumping Scrapy stats: 339 {‘downloader/request_bytes‘: 1044, 340 ‘downloader/request_count‘: 3, 341 ‘downloader/request_method_count/GET‘: 3, 342 ‘downloader/response_bytes‘: 354134, 343 ‘downloader/response_count‘: 3, 344 ‘downloader/response_status_count/200‘: 3, 345 ‘finish_reason‘: ‘finished‘, 346 ‘finish_time‘: datetime.datetime(2019, 5, 8, 9, 0, 29, 1579), 347 ‘item_scraped_count‘: 60, 348 ‘log_count/DEBUG‘: 63, 349 ‘log_count/INFO‘: 9, 350 ‘log_count/WARNING‘: 1, 351 ‘response_received_count‘: 3, 352 ‘scheduler/dequeued‘: 3, 353 ‘scheduler/dequeued/memory‘: 3, 354 ‘scheduler/enqueued‘: 3, 355 ‘scheduler/enqueued/memory‘: 3, 356 ‘start_time‘: datetime.datetime(2019, 5, 8, 9, 0, 28, 449885)} 357 2019-05-08 17:00:29 [scrapy.core.engine] INFO: Spider closed (finished)
以上是关于31当当图书榜单爬虫的主要内容,如果未能解决你的问题,请参考以下文章
Python爬虫编程思想(69): 项目实战--抓取当当图书排行榜