使用 selenium 进行网络抓取返回空列表

Posted 2023-02-23

技术标签:

【中文标题】使用 selenium 进行网络抓取返回空列表【英文标题】：web scraping with selenium returns empty list 【发布时间】：2022-01-12 22:07:36 【问题描述】：

我以前做过一些网页抓取，但我不知道 javascript。我想从https://www.ces.tech/Show-Floor/Exhibitor-Directory.aspx 中抓取“公司名称”和“公司描述”。我正在使用 selenium 进行抓取，但我不想在后台使用浏览器。我在这里写了一些代码：

from selenium.webdriver.common.by import By
from selenium import webdriver
import os
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op)
driver.get('https://www.ces.tech/Show-Floor/Exhibitor-Directory.aspx')
company = []
items = driver.find_elements(By.CLASS_NAME, "exhibitorCardModal")
for item in items:
    comp=item.find_elements(By.CLASS_NAME, "company-name")
    desc = item.find_elements(By.CLASS_NAME, "description")
    result_dict = 
        "company":comp.text,
        "description":desc.text
    
    company.append(result_dict)

但是得到了空列表。有人可以告诉我这里出了什么问题。我也尝试使用那里的 api https://www.ces.tech/api/Exhibitors?searchTerm=&sortBy=alpha&alpha=&state=&country=&venue=&exhibitorType=&pageNo=1&pageSize=30 但得到了这个错误：

"error":"code":"ApiVersionUnspecified","message":"An API version is required, but was not specified."

【问题讨论】：

如果我的回答解决了您的问题，请告诉我？感谢您的努力。但收到错误“TimeoutException: Message:” On this line "wait.until(EC.presence_of_element_located((By.CLASS_NAME, "exhibitorCardModal")))" 我明白了.. 请尝试更新版本仍然“项目”有空格意味着空。 @SarlaDevi items 是一个网络元素列表，而不是文本。你不能打印它。那个列表是空的吗？ 【参考方案1】：

find_element

find_elements

comp=item.find_elements(By.CLASS_NAME, "company-name")
desc = item.find_elements(By.CLASS_NAME, "description")

所以你的代码应该是这样的：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import os
import time

op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op)
wait = WebDriverWait(driver, 20)

driver.get('https://www.ces.tech/Show-Floor/Exhibitor-Directory.aspx')

wait.until(EC.presence_of_element_located((By.CLASS_NAME, "exhibitorCardModal")))
time.sleep(0.5)
company = []
items = driver.find_elements(By.CLASS_NAME, "exhibitorCardModal")
for item in items:
    comp=item.find_element(By.CLASS_NAME, "company-name")
    desc = item.find_element(By.CLASS_NAME, "description")
    result_dict = 
        "company":comp.text,
        "description"::desc.text
    
    company.append(result_dict)

【讨论】：

以上是关于使用 selenium 进行网络抓取返回空列表的主要内容，如果未能解决你的问题，请参考以下文章