将表格抓取并写入数据框会显示 TypeError
Posted
技术标签:
【中文标题】将表格抓取并写入数据框会显示 TypeError【英文标题】:Scraping and writing the table into dataframe shows me TypeError 【发布时间】:2022-01-06 12:11:56 【问题描述】:我正在尝试抓取表格并写入他们向我显示typeerror
的数据框中。如何解决这些错误?
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium import webdriver
import pandas as pd
temp=[]
driver= webdriver.Chrome('C:\Program Files (x86)\chromedriver.exe')
driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Inline Frame Example']")))
headers=WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='sites']//thead"))).text
rows=WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='sites']//tbody"))).text
temp.append(rows)
df = pd.DataFrame(temp,columns=headers)
print(df)
在标题中我传递数据FAMI-QS Number
... Expiry date
而在行中我将传递FAM-0694
... 2022-09-04
【问题讨论】:
【参考方案1】:您可以仅使用 pandas 从 api 调用 html 响应中获取所有表数据,如下所示:
代码:
import requests
import pandas as pd
headers = 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
url = "https://famiqs.viasyst.net/certified-sites"
req = requests.get(url,headers=headers)
table = pd.read_html(req.text)
df = table[0]#.to_csv('info.csv',index = False)
print(df)
输出:
FAMI-QS Number ... Expiry date
0 FAM-0694 ... 2022-09-04
1 FAM-1491 ... 2022-10-17
2 FAM-ISFSF-003 ... 2022-10-27
3 FAM-1533 ... 2022-10-31
4 FAM-1090 ... 2022-11-13
... ... ... ...
1472 FAM-1761-01 ... 2024-10-27
1473 FAM-1796 ... 2024-09-29
1474 FAM-1427-01 ... 2023-12-01
1475 FAM-1861 ... 2024-11-22
1476 FAM-0005-07 ... 2024-11-25
[1477 rows x 7 columns]
【讨论】:
【参考方案2】:要scrape FAMI QS Number 和 Site Name 列,您需要使用List Comprehension inducing @ 创建所需文本的列表987654322@ 用于visibility_of_all_elements_located(),您可以使用以下任一Locator Strategies:
代码块:
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
FAMI_QS_Numbers = []
Site_Names = []
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Inline Frame Example']")))
FAMI_QS_Numbers.extend([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='sites']//tbody//tr/descendant::td[1]")))])
Site_Names.extend([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='sites']//tbody//tr//td/p")))])
df = pd.DataFrame(data=list(zip(FAMI_QS_Numbers, Site_Names)), columns=['FAMI QS Number', 'Site Name'])
print(df)
driver.quit()
控制台输出:
FAMI QS Number Site Name
0 FAM-1293 AmTech Ingredients
1 FAM-0841 3F FEED & FOOD S L
2 FAM-1361 5N Plus Additives GmbH
3 FAM-1301-01 A & V Corp. Limited
4 FAM-1146 A. + E. Fischer-Chemie GmbH & Co. KG
5 FAM-1589 A.M FOOD CHEMICAL CO LIMITED
6 FAM-0613-01 A.W.P. S.r.l
7 FAM-0867 AB AGRI POLSKA Sp. z o.o.
8 FAM-1510-02 AB Vista
9 FAM-1510-01 AB Vista *
【讨论】:
【参考方案3】:要抓取所有列中的所有数据,您需要为 <table>
元素的 visibility_of_element_located() 诱导 WebDriverWait,提取 outerHTML ,使用read_html()
阅读outerHTML,你可以使用以下Locator Strategies:
代码块:
driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Inline Frame Example']")))
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#sites"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
driver.quit()
控制台输出:
[ FAMI-QS Number Site Name City ... Status Certified from Expiry date
0 FAM-1293 AmTech Ingredients albert lea ... Valid 2020-10-08 2023-10-07
1 FAM-0841 3F FEED & FOOD S L vizcolozano ... Valid 2020-04-17 2023-04-16
2 FAM-1361 5N Plus Additives GmbH eisenhüttenstadt ... Valid 2020-10-01 2023-09-30
3 FAM-1301-01 A & V Corp. Limited xiamen ... Valid 2020-09-09 2023-09-08
4 FAM-1146 A. + E. Fischer-Chemie GmbH & Co. KG wiesbaden ... Valid 2020-06-05 2023-06-04
5 FAM-1589 A.M FOOD CHEMICAL CO LIMITED jinan ... Valid 2020-01-07 2023-01-06
6 FAM-0613-01 A.W.P. S.r.l crevalcore ... Valid 2020-02-27 2023-02-07
7 FAM-0867 AB AGRI POLSKA Sp. z o.o. smigiel ... Valid 2020-08-03 2023-03-19
8 FAM-1510-02 AB Vista marlborough ... Valid 2020-04-16 2023-04-15
9 FAM-1510-01 AB Vista * rotterdam ... Valid 2020-04-16 2023-04-15
[10 rows x 7 columns]]
【讨论】:
以上是关于将表格抓取并写入数据框会显示 TypeError的主要内容,如果未能解决你的问题,请参考以下文章
使用 python 和 Beautifulsoup4 从抓取数据中写入和保存 CSV 文件
Swift Parse - 本地数据存储并在表格视图中显示对象