如何使用python将多个抓取的数据保存到mysql

Posted

技术标签:

【中文标题】如何使用python将多个抓取的数据保存到mysql【英文标题】:How to save multiple scraped data to mysql using python 【发布时间】:2022-01-22 19:56:43 【问题描述】:

我有一个从 cars.com 上抓取的数据。我试图将它们保存到 mysql 数据库中,但我无法做到。这是我的完整代码:

#ScrapeData.py

import requests
from bs4 import BeautifulSoup

URL = "https://www.cars.com/shopping/results/?dealer_id=&keyword=&list_price_max=&list_price_min=&makes[]=&maximum_distance=all&mileage_max=&page=1&page_size=100&sort=best_match_desc&stock_type=cpo&year_max=&year_min=&zip="
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html.parser')
cars = soup.find_all('div', class_='vehicle-card')

name = []
mileage = []
dealer_name = []
rating = []
rating_count = []
price = []


for car in cars:
    #name
    name.append(car.find('h2').get_text())
    #mileage
    mileage.append(car.find('div', 'class':'mileage').get_text())
    #dealer_name
    dealer_name.append(car.find('div', 'class':'dealer-name').get_text())
    #rate
    try:
        rating.append(car.find('span', 'class':'sds-rating__count').get_text())
    except:
        rating.append("n/a")
    #rate_count
    rating_count.append(car.find('span', 'class':'sds-rating__link').get_text())
    #price
    price.append(car.find('span', 'class':'primary-price').get_text())

#save_to_mysql.py

import pymysql
import scrapeData
import mysql.connector

connection = pymysql.connect(
    host='localhost',
    user='root',
    password='',
    db='cars',
)

name = scrapeData.name
mileage = scrapeData.mileage
dealer_name = scrapeData.dealer_name
rating = scrapeData.rating
rating_count = scrapeData.rating_count
price = scrapeData.price

try:
    mySql_insert_query = """INSERT INTO cars_details (name, mileage, dealer_name, rating, rating_count, price) 
                           VALUES (%s, %s, %s, %s, %s, %s) """

    records_to_insert = [(name, mileage, dealer_name, rating, rating_count, price)]

    print(records_to_insert)

    cursor = connection.cursor()
    cursor.executemany(mySql_insert_query, records_to_insert)
    connection.commit()
    print(cursor.rowcount, "Record inserted successfully into cars_details table")

except mysql.connector.Error as error:
    print("Failed to insert record into MySQL table ".format(error))


    connection.commit()
finally:
    connection.close()

每当我运行此代码时,我都会收到以下错误消息:

Traceback (most recent call last):
  File "c:\scraping\save_to_mysql.py", line 28, in <module>
    cursor.executemany(mySql_insert_query, records_to_insert)
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\pymysql\cursors.py", line 173, in executemany     
    return self._do_execute_many(
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\pymysql\cursors.py", line 211, in _do_execute_many    rows += self.execute(sql + postfix)
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\pymysql\cursors.py", line 148, in execute
    result = self._query(query)
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\pymysql\cursors.py", line 310, in _query
    conn.query(q)
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\pymysql\connections.py", line 548, in query       
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\pymysql\connections.py", line 775, in _read_query_result
    result.read()
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\pymysql\connections.py", line 1156, in read       
    first_packet = self.connection._read_packet()
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\pymysql\connections.py", line 725, in _read_packet    packet.raise_for_error()
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\pymysql\protocol.py", line 221, in raise_for_error    err.raise_mysql_exception(self._data)
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\pymysql\err.py", line 143, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.OperationalError: (1241, 'Operand should contain 1 column(s)')

有人知道如何解决这个问题吗?我想在一次执行中在 MySQL 中插入多个抓取的数据。我会很高兴得到你的帮助

【问题讨论】:

【参考方案1】:

首先,我不会为您的所有数据使用单独的列表,而是使用单个列表,收集有关单个汽车的所有信息。所以就像嵌套在它里面。所以而不是

millage = []
delar_name = []

我会创建一个名为汽车的列表:

cars = []

然后我会为您在汽车上拥有的所有不同的刮擦信息创建 dirrerent 变量,如下所示:

#brand
brand = car.find('h2').get_text()
#mileage
mileage = car.find('div', 'class':'mileage').get_text()

然后我将创建要附加的列表并将其附加到列表中。

toAppend = brand, mileage, dealer_name, rating, rating_count, price
cars.append(toAppend)

那么输出将是:

[('2018 Mercedes-Benz CLA 250 Base', '21,326 mi.', '\nMercedes-Benz of South Bay\n', '4.6', '(1,020 reviews)', '$33,591'), ('2021 Toyota Highlander Hybrid XLE', '9,529 mi.', '\nToyota of Gastonia\n', '4.6', '(590 reviews)', '$47,869')]

我对mysql做了一个小改动。插入到一个函数中,然后将该函数作为参数导入到列表中的主脚本中。奇迹般有效。 我知道这不是关于为什么以及如何工作的详尽答案,但它是一个解决方案。

import requests
from bs4 import BeautifulSoup
from scrapertestsql import insertScrapedCars

URL = "https://www.cars.com/shopping/results/?dealer_id=&keyword=&list_price_max=&list_price_min=&makes[]=&maximum_distance=all&mileage_max=&page=1&page_size=100&sort=best_match_desc&stock_type=cpo&year_max=&year_min=&zip="
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html.parser')
scrapedCars = soup.find_all('div', class_='vehicle-card')

cars = []
# mileage = []
# dealer_name = []
# rating = []
# rating_count = []
# price = []



for car in scrapedCars:
    #name
    brand = car.find('h2').get_text()
    #mileage
    mileage = car.find('div', 'class':'mileage').get_text()
    #dealer_name
    dealer_name = car.find('div', 'class':'dealer-name').get_text()
    #rate
    try:
        rating = car.find('span', 'class':'sds-rating__count').get_text()
    except:
        rating = "n/a"
    #rate_count
    rating_count = car.find('span', 'class':'sds-rating__link').get_text()
    #price
    price = car.find('span', 'class':'primary-price').get_text()
    toAppend = brand, mileage, dealer_name, rating, rating_count, price
    cars.append(toAppend)

insertScrapedCars(cars)
    
print(cars)

接下来我会:

import pymysql
import mysql.connector

connection = pymysql.connect(
    host='127.0.0.1',
    user='test',
    password='123',
    db='cars',
    port=8889
)


def insertScrapedCars(CarsToInsert):
    try:
        mySql_insert_query = """INSERT INTO cars_details (name, mileage, dealer_name, rating, rating_count, price) 
                            VALUES (%s, %s, %s, %s, %s, %s) """

        cursor = connection.cursor()
        cursor.executemany(mySql_insert_query, CarsToInsert)
        connection.commit()
        print(cursor.rowcount, "Record inserted successfully into cars_details table")

    except mysql.connector.Error as error:
        print("Failed to insert record into MySQL table ".format(error))

    finally:
        connection.close()

【讨论】:

谢谢伙计。这解释得很好。我现在知道我的错误了。

以上是关于如何使用python将多个抓取的数据保存到mysql的主要内容,如果未能解决你的问题,请参考以下文章

如何将抓取数据保存到 CSV 文件中?

Python爬虫编程思想(151):使用Scrapy抓取数据,用ItemLoader保存单条抓取的数据

Python爬虫编程思想(151):使用Scrapy抓取数据,用ItemLoader保存单条抓取的数据

Python爬虫编程思想(150):使用Scrapy抓取数据,并将抓取到的数据保存为多种格式的文件

Python爬虫编程思想(150):使用Scrapy抓取数据,并将抓取到的数据保存为多种格式的文件

如何在 Python 中使用正则表达式从同一个字符串中提取多个值?