抓取动态数据硒 - 无法定位元素
Posted
技术标签:
【中文标题】抓取动态数据硒 - 无法定位元素【英文标题】:Scraping dynamic data selenium - Unable to locate element 【发布时间】:2022-01-09 08:54:26 【问题描述】:我对抓取非常陌生并且有一个问题。我正在抓取 worldometers covid 数据。因为它是动态的 - 我正在使用 selenium。
代码如下:
from selenium import webdriver
import time
URL = "https://www.worldometers.info/coronavirus/"
# Start the Driver
driver = webdriver.Chrome(executable_path = r"C:\Webdriver\chromedriver.exe")
# Hit the url and wait for 10 seconds.
driver.get(URL)
time.sleep(10)
#find class element
data= driver.find_elements_by_class_name("odd" and "even")
#for loop
for d in data:
country=d.find_element_by_xpath(".//*[@id='main_table_countries_today']").text
print(country)
当前输出:
NoSuchElementException: Message: no such element: Unable to locate element: "method":"xpath","selector":".//*[@id='main_table_countries_today']"
(Session info: chrome=96.0.4664.45)
【问题讨论】:
***.com/questions/62302639/… 我很想关闭为重复项,尽管您似乎想要硒答案。 【参考方案1】:要在worldometers covid data 内刮表,您需要为visibility_of_element_located() 诱导WebDriverWait,并使用Pandas 中的DataFrame,您可以使用以下Locator Strategy:
代码块:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
options = Options()
options.add_argument("start-maximized")
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.worldometers.info/coronavirus/")
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#main_table_countries_today"))).get_attribute("outerhtml")
df = pd.read_html(data)
print(df)
driver.quit()
控制台输出:
[ # Country,Other TotalCases NewCases ... Deaths/1M pop TotalTests Tests/ 1M pop Population
0 NaN World 264359298 632349.0 ... 673.3 NaN NaN NaN
1 1.0 USA 49662381 89259.0 ... 2415.0 756671013.0 2267182.0 3.337495e+08
2 2.0 India 34609741 3200.0 ... 336.0 643510926.0 459914.0 1.399198e+09
3 3.0 Brazil 22118782 12910.0 ... 2865.0 63776166.0 297051.0 2.146975e+08
4 4.0 UK 10329074 53945.0 ... 2124.0 364875273.0 5335159.0 6.839070e+07
.. ... ... ... ... ... ... ... ... ...
221 221.0 Samoa 3 NaN ... NaN NaN NaN 2.002800e+05
222 222.0 Saint Helena 2 NaN ... NaN NaN NaN 6.103000e+03
223 223.0 Micronesia 1 NaN ... NaN NaN NaN 1.167290e+05
224 224.0 Tonga 1 NaN ... NaN NaN NaN 1.073890e+05
225 NaN Total: 264359298 632349.0 ... 673.3 NaN NaN NaN
[226 rows x 15 columns]]
【讨论】:
它有效!感谢您的帮助以上是关于抓取动态数据硒 - 无法定位元素的主要内容,如果未能解决你的问题,请参考以下文章
如何解决硒循环中的这个错误? InvalidSelectorException:消息:无效选择器:无法使用 xpath 定位元素