使用动态鼠标悬停事件抓取网站

Posted

技术标签:

【中文标题】使用动态鼠标悬停事件抓取网站【英文标题】:Scrape website with dynamic mouseover event 【发布时间】:2020-01-13 23:55:37 【问题描述】:

我正在尝试抓取从鼠标悬停事件动态生成的数据。 我想从https://slushpool.com/stats/?c=btc 的哈希率分布图表中捕获信息,该图表在您滚动每个圆圈时生成。

下面的代码从网站获取html数据,并返回一旦鼠标经过一个圆圈就被填充的表格。但是,我无法弄清楚如何触发每个圆圈的鼠标悬停事件以填充表格。

from lxml import etree
from xml.etree import ElementTree
from selenium import webdriver

driver_path = "#Firefox web driver"
browser = webdriver.Firefox(executable_path=driver_path)
browser.get("https://slushpool.com/stats/?c=btc") 


page = browser.page_source #Get page html 
tree = etree.HTML(page) #create etree

table_Xpath = '/html/body/div[1]/div/div/div/div/div[5]/div[1]/div/div/div[2]/div[2]/div[2]/div/table'

table =tree.xpath(table_Xpath) #get table using Xpath

print(ElementTree.tostring(table[0])) #Returns empty table. 
#Should return data from each mouseover event

有没有办法为每个圆圈触发 mouseover 事件,然后提取生成的数据。

提前感谢您的帮助!

【问题讨论】:

【参考方案1】:

要为每个圆圈触发鼠标悬停事件,您必须为visibility_of_all_elements_located() 诱导WebDriverWait,您可以使用以下Locator Strategies:

代码块:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://slushpool.com/stats/?c=btc")
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1//span[text()='Distribution']"))))
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//h1//span[text()='Distribution']//following::div[1]/*[name()='svg']//*[name()='g']//*[name()='g' and @class='paper']//*[name()='circle']")))
for element in elements:
    ActionChains(driver).move_to_element(element).perform()

浏览器快照:

【讨论】:

【参考方案2】:

这是您的意思的circle 定位器:

.find_element_by_css_selector('._1p0PmxVw._3GzjmWLG')

但它会因为鼠标悬停效果而改变,是:

.find_element_by_css_selector('._1p0PmxVw._3GzjmWLG._1suU9Mx1')

所以你需要等到每次移动的元素都发生变化。

而且最重要的是how to inspect a hover element,那么你就可以得到下面的:

并导致出现您要获取数据的元素:

xpath: //div[@class="_3jGHi0co _1zbokARu" and contains(@style,"display: block")]

您可以使用ActionChains 来执行移动元素。

终于可以试试下面的代码了:

browser.get('https://slushpool.com/stats/?c=btc')
browser.maximize_window()

#wait all circle
elements = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '._1p0PmxVw._3GzjmWLG')))
table = browser.find_element_by_class_name('paper')

#move perform -> to table
browser.execute_script("arguments[0].scrollIntoView(true);", table)

data = []
for circle in elements:
    #move perform -> to each circle
    ActionChains(browser).move_to_element(circle).perform()
    # wait change mouseover effect
    mouseover = WebDriverWait(browser, 5).until(EC.visibility_of_element_located((By.XPATH, '//div[@class="_3jGHi0co _1zbokARu" and contains(@style,"display: block")]')))
    data.append(mouseover.text)

print(data[0])
print(data)

导入后:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains

控制台输出:

第一个数据>data[0] 536.9 Ph/s - 1.074 Eh/s 用户数 2 平均哈希率 546.1 Ph/s 群组算力 1.092 Eh/s 所有数据 > data
[u'536.9 Ph/s - 1.074 Eh/s\nUser Count 2\nAverage Hash Rate 546.9 Ph/s\nGroup Hash Rate 1.094 Eh/s', u'67.11 Ph/s - 134.2 Ph/s\nUser Count 14\nAverage Hash Rate 91.27 Ph/s\nGroup Hash Rate 1.278 Eh/s', u'67.11 Ph/s - 134.2 Ph/s\nUser Count 14\nAverage Hash Rate 91.27 Ph/s\nGroup Hash Rate 1.278 Eh/s', u'16.78 Ph/s - 33.55 Ph/s\nUser Count 23\nAverage Hash Rate 23.36 Ph/s\nGroup Hash Rate 537.2 Ph/s', u'8.389 Ph/s - 16.78 Ph/s\nUser Count 33\nAverage Hash Rate 11.80 Ph/s\nGroup Hash Rate 389.4 Ph/s', u'4.194 Ph/s - 8.389 Ph/s\nUser Count 67\nAverage Hash Rate 5.704 Ph/s\nGroup Hash Rate 382.2 Ph/s', u'2.097 Ph/s - 4.194 Ph/s\nUser Count 137\nAverage Hash Rate 2.959 Ph/s\nGroup Hash Rate 405.3 Ph/s', u'1.049 Ph/s - 2.097 Ph/s\nUser Count 233\nAverage Hash Rate 1.475 Ph/s\nGroup Hash Rate 343.7 Ph/s', u'1.049 Ph/s - 2.097 Ph/s\nUser Count 233\nAverage Hash Rate 1.475 Ph/s\nGroup Hash Rate 343.7 Ph/s', u'524.3 Th/s - 1.049 Ph/s\nUser Count 397\nAverage Hash Rate 731.4 Th/s\nGroup Hash Rate 290.4 Ph/s', u'262.1 Th/s - 524.3 Th/s\nUser Count 745\nAverage Hash Rate 360.3 Th/s\nGroup Hash Rate 268.4 Ph/s', u'131.1 Th/s - 262.1 Th/s\nUser Count 1479\nAverage Hash Rate 182.7 Th/s\nGroup Hash Rate 270.1 Ph/s', u'65.54 Th/s - 131.1 Th/s\nUser Count 2351\nAverage Hash Rate 92.47 Th/s\nGroup Hash Rate 217.4 Ph/s', u'32.77 Th/s - 65.54 Th/s\nUser Count 3107\nAverage Hash Rate 47.23 Th/s\nGroup Hash Rate 146.8 Ph/s', u'16.38 Th/s - 32.77 Th/s\nUser Count 3380\nAverage Hash Rate 25.24 Th/s\nGroup Hash Rate 85.30 Ph/s', u'8.192 Th/s - 16.38 Th/s\nUser Count 4276\nAverage Hash Rate 13.00 Th/s\nGroup Hash Rate 55.57 Ph/s', u'4.096 Th/s - 8.192 Th/s\nUser Count 540\nAverage Hash Rate 5.953 Th/s\nGroup Hash Rate 3.215 Ph/s', u'2.048 Th/s - 4.096 Th/s\nUser Count 284\nAverage Hash Rate 3.193 Th/s\nGroup Hash Rate 906.8 Th/s', u'1.024 Th/s - 2.048 Th/s\nUser Count 226\nAverage Hash Rate 1.368 Th/s\nGroup Hash Rate 309.1 Th/s', u'512.0 Gh/s - 1.024 Th/s\nUser Count 136\nAverage Hash Rate 774.4 Gh/s\nGroup Hash Rate 105.3 Th/s', u'256.0 Gh/s - 512.0 Gh/s\nUser Count 116\nAverage Hash Rate 401.5 Gh/s\nGroup Hash Rate 46.57 Th/s', u'128.0 Gh/s - 256.0 Gh/s\nUser Count 75\nAverage Hash Rate 186.4 Gh/s\nGroup Hash Rate 13.98 Th/s', u'64.00 Gh/s - 128.0 Gh/s\nUser Count 78\nAverage Hash Rate 96.39 Gh/s\nGroup Hash Rate 7.518 Th/s', u'32.00 Gh/s - 64.00 Gh/s\nUser Count 70\nAverage Hash Rate 45.68 Gh/s\nGroup Hash Rate 3.198 Th/s', u'16.00 Gh/s - 32.00 Gh/s\nUser Count 48\nAverage Hash Rate 23.37 Gh/s\nGroup Hash Rate 1.122 Th/s', u'8.000 Gh/s - 16.00 Gh/s\nUser Count 62\nAverage Hash Rate 11.91 Gh/s\nGroup Hash Rate 738.5 Gh/s', u'4.000 Gh/s - 8.000 Gh/s\nUser Count 153\nAverage Hash Rate 3.078 Gh/s\nGroup Hash Rate 471.0 Gh/s']

【讨论】:

显示所有可以使用的数据pprint试试pprint(data)!只需导入from pprint import pprint...玩得开心!

以上是关于使用动态鼠标悬停事件抓取网站的主要内容,如果未能解决你的问题,请参考以下文章

ASP 动态菜单 子菜单 鼠标悬停在子菜单上时消失。

在鼠标悬停时动态添加和删除类 - Vue.js

动态图像附加到鼠标悬停下拉

嵌入在对象元素中的 SVG 上的鼠标指针悬停/jquery 单击事件不起作用

jquery事件以及监听动态节点事件

通过jQuery更改鼠标悬停时的动态按钮文本[复制]