抓取动态数据硒 - 无法定位元素

Posted

技术标签:

【中文标题】抓取动态数据硒 - 无法定位元素【英文标题】:Scraping dynamic data selenium - Unable to locate element 【发布时间】:2022-01-09 08:54:26 【问题描述】:

我对抓取非常陌生并且有一个问题。我正在抓取 worldometers covid 数据。因为它是动态的 - 我正在使用 selenium。

代码如下:

from selenium import webdriver
import time

URL = "https://www.worldometers.info/coronavirus/"

# Start the Driver
driver = webdriver.Chrome(executable_path = r"C:\Webdriver\chromedriver.exe")
# Hit the url and wait for 10 seconds.
driver.get(URL)
time.sleep(10)
#find class element
data= driver.find_elements_by_class_name("odd" and "even")
#for loop
for d in data:
    country=d.find_element_by_xpath(".//*[@id='main_table_countries_today']").text
    print(country)

当前输出:

NoSuchElementException: Message: no such element: Unable to locate element: "method":"xpath","selector":".//*[@id='main_table_countries_today']"
  (Session info: chrome=96.0.4664.45)

【问题讨论】:

***.com/questions/62302639/… 我很想关闭为重复项,尽管您似乎想要硒答案。 【参考方案1】:

要在worldometers covid data 内刮表,您需要为visibility_of_element_located() 诱导WebDriverWait,并使用Pandas 中的DataFrame,您可以使用以下Locator Strategy:

代码块:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

options = Options()
options.add_argument("start-maximized")
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.worldometers.info/coronavirus/")
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#main_table_countries_today"))).get_attribute("outerhtml")
df  = pd.read_html(data)
print(df)
driver.quit()

控制台输出:

[         # Country,Other  TotalCases  NewCases  ...  Deaths/1M pop   TotalTests  Tests/ 1M pop    Population
0      NaN         World   264359298  632349.0  ...          673.3          NaN            NaN           NaN
1      1.0           USA    49662381   89259.0  ...         2415.0  756671013.0      2267182.0  3.337495e+08
2      2.0         India    34609741    3200.0  ...          336.0  643510926.0       459914.0  1.399198e+09
3      3.0        Brazil    22118782   12910.0  ...         2865.0   63776166.0       297051.0  2.146975e+08
4      4.0            UK    10329074   53945.0  ...         2124.0  364875273.0      5335159.0  6.839070e+07
..     ...           ...         ...       ...  ...            ...          ...            ...           ...
221  221.0         Samoa           3       NaN  ...            NaN          NaN            NaN  2.002800e+05
222  222.0  Saint Helena           2       NaN  ...            NaN          NaN            NaN  6.103000e+03
223  223.0    Micronesia           1       NaN  ...            NaN          NaN            NaN  1.167290e+05
224  224.0         Tonga           1       NaN  ...            NaN          NaN            NaN  1.073890e+05
225    NaN        Total:   264359298  632349.0  ...          673.3          NaN            NaN           NaN

[226 rows x 15 columns]]

【讨论】:

它有效!感谢您的帮助

以上是关于抓取动态数据硒 - 无法定位元素的主要内容,如果未能解决你的问题,请参考以下文章

如何解决硒循环中的这个错误? InvalidSelectorException:消息:无效选择器:无法使用 xpath 定位元素

python网络爬虫抓取动态网页并将数据存入数据库MySQL

如何让硒单击动态按钮?

抓取js动态生成数据

无法使用 jQuery 定位动态生成的元素

为什么BeautifulSoup无法解析页面的所有元素? (答案:BeautifulSoup中的CSS选择器)