如何从 MySql 数据库中读取 Scrapy Start_urls?
Posted
技术标签:
【中文标题】如何从 MySql 数据库中读取 Scrapy Start_urls?【英文标题】:How can I read Scrapy Start_urls from MySql database? 【发布时间】:2020-12-13 03:02:33 【问题描述】:我正在尝试读取和写入 mysql 的所有输出。当我的蜘蛛开始抓取时,我想从 MySQL 数据库中获取所有 URL,因此我尝试创建一个函数来读取数据。
readdata.py:
import mysql.connector
from mysql.connector import Error
from itemadapter import ItemAdapter
def dataReader(marketName):
try:
connection = mysql.connector.connect(host='localhost',
database='test',
user='root',
port=3306,
password='1234')
sql_select_Query = "SELECT shop_URL FROM datatable.bot_markets WHERE shop_name='"+marketName+"';"
cursor = connection.cursor()
cursor.execute(sql_select_Query)
records = cursor.fetchall()
return records
except Error as e:
print("Error reading data from MySQL table", e)
finally:
if (connection.is_connected()):
connection.close()
cursor.close()
print("MySQL connection is closed")
我想从我的蜘蛛中调用这个函数,如下所示。
我的蜘蛛:
import scrapy
import re
import mysql.connector
from ..items import FirstBotItem
from scrapy.utils.project import get_project_settings
from first_bot.readdata import dataReader
class My_Spider(scrapy.Spider):
name = "My_Spider"
allowed_domains = ["quotes.toscrape.com/"]
start_urls = dataReader(name)
def parse(self, response):
location = "quotes"
for product in response.xpath('.//div[@class="product-card product-action "]'):
product_link = response.url
prices = product.xpath('.//div[@class="price-tag"]/span[@class="value"]/text()').get()
if prices != None:prices = re.sub(r"[\s]", "", prices)
title = product.xpath('.//h5[@class="title product-card-title"]/a/text()').get()
unit = product.xpath('.//div[@class="select single-select"]//i/text()').get()
if unit != None: unit = re.sub(r"[\s]", "", unit)
item = FirstBotItem()
item['LOKASYON'] = location
item['YEAR'] = 2020
item['MONTH'] = 8
yield item
我在 start_urls 上做错了,但我想不通。我收到此错误。
_set_url
raise TypeError('Request url must be str or unicode, got %s:' % type(url).__name__)
TypeError: Request url must be str or unicode, got tuple:
2020-08-24 15:46:31 [scrapy.core.engine] INFO: Closing spider (finished)
我的主要任务是从数据库中获取所有 URL。因为有人会在同一个网站上写网址,蜘蛛会自动爬取。
【问题讨论】:
【参考方案1】:您可以尝试将dataReader
方法中的逻辑从:
records = cursor.fetchall()
return records
到:
records = cursor.fetchall()
records_list = []
for rec in records:
records_list.append(rec)
return records_list
【讨论】:
当我在返回之前检查 records 的类型时,它会返回您应该在 dataReader 函数中写入 return list(records)
而不是 return records
。
【讨论】:
以上是关于如何从 MySql 数据库中读取 Scrapy Start_urls?的主要内容,如果未能解决你的问题,请参考以下文章
ValueError:mysql scrapy管道中不支持的格式字符