我无法抓取数据
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了我无法抓取数据相关的知识,希望对你有一定的参考价值。
我正在使用scrapy从网站上抓取数据。这是我的代码
import scrapy
class ShopSpider(scrapy.Spider):
name = 'shop'
allowed_domains = ['https://www.shopclues.com/mobiles-smartphones.html?sort_by=bestsellers']
start_urls = ['http://https://www.shopclues.com/mobiles-smartphones.html?sort_by=bestsellers/']
custom_settings = {
'FEED_URI': 'tmp/shop.csv'
}
def parse(self, response):
titles = response.css('img::attr(title)').extract()
images = response.css('img::attr(data-img)').extract()
prices = response.css('.p_price::text').extract()
discounts = response.css('.prd_discount::text').extract()
for item in zip(titles, prices, images, discounts):
scraped_info = {
'title': item[0],
'price': item[1],
'image_urls': [item[2]], # Set's the url for scrapy to download images
'discount': item[3]
}
yield scraped_info
请检查我做错了什么?此外,我想在滚动时抓取所有数据。所以它应该采取所有数据,直到我们滚动?那我该怎么办呢?
答案
你有问题:
- 不正确的
allowed_domain
(只需要域名); - 打破
start_urls
(http两次,斜线到底); - 错误打算在
parse
函数中产生项目。
检查固定代码:
import scrapy
class ShopSpider(scrapy.Spider):
name = 'shop'
allowed_domains = ['shopclues.com']
start_urls = ['https://www.shopclues.com/mobiles-smartphones.html?sort_by=bestsellers']
def parse(self, response):
titles = response.css('img::attr(title)').extract()
images = response.css('img::attr(data-img)').extract()
prices = response.css('.p_price::text').extract()
discounts = response.css('.prd_discount::text').extract()
for item in zip(titles, prices, images, discounts):
scraped_info = {
'title': item[0],
'price': item[1],
'image_urls': [item[2]], # Set's the url for scrapy to download images
'discount': item[3]
}
yield scraped_info
以上是关于我无法抓取数据的主要内容,如果未能解决你的问题,请参考以下文章
Android:NullPointerException 无法将数据库加载到片段内的列表视图中