将 BeautifulSoup 元素解析为 Selenium

Posted 2023-02-23

技术标签:

【中文标题】将 BeautifulSoup 元素解析为 Selenium【英文标题】：Parse BeautifulSoup element into Selenium 【发布时间】：2016-10-25 01:56:32 【问题描述】：

我想获取一个使用 selenium 的网站的源代码；使用 BeautifulSoup 查找特定元素；然后将其解析回 selenium 作为 selenium.webdriver.remote.webelement 对象。像这样：

driver.get("www.google.com")
soup = BeautifulSoup(driver.source)
element = soup.find(title="Search")

element = Selenium.webelement(element)
element.click()

我怎样才能做到这一点？

【问题讨论】：

【参考方案1】：

我不知道从 bs4 到 selenium 的任何方法，但您可以使用 selenium 来查找元素：

driver.find_element_by_xpath('//input[@title="Search"]').click()

或者像你的 bs4 find 那样只使用标题文本来查找：

driver.find_element_by_xpath('//*[@title="Search"]').click()

【讨论】：

【参考方案2】：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get("http://www.google.com")
soup = BeautifulSoup(driver.page_source, 'html.parser')
search_soup_element = soup.find(title="Search")
input_element = soup.select('input.gsfi.lst-d-f')[0]

search_box = driver.find_element(by='name', value=input_element.attrs['name'])
search_box.send_keys('Hello World!')
search_box.send_keys(Keys.RETURN)

这非常有效。我可以看到同时使用 webdriver 和 BeautifulSoup 的原因，但对于这个例子来说不一定。

【讨论】：

【参考方案3】：

对我有用的一般解决方案是计算 the xpath of the bs4 element，然后用它来查找 selenium 中的元素，

xpath = xpath_soup(soup_element)
selenium_element = driver.find_element_by_xpath(xpath)

...

import itertools

def xpath_soup(element):
    """
    Generate xpath of soup element
    :param element: bs4 text or node
    :return: xpath as string
    """
    components = []
    child = element if element.name else element.parent
    for parent in child.parents:
        """
        @type parent: bs4.element.Tag
        """
        previous = itertools.islice(parent.children, 0, parent.contents.index(child))
        xpath_tag = child.name
        xpath_index = sum(1 for i in previous if i.name == xpath_tag) + 1
        components.append(xpath_tag if xpath_index == 1 else '%s[%d]' % (xpath_tag, xpath_index))
        child = parent
    components.reverse()
    return '/%s' % '/'.join(components)

【讨论】：

为我工作！谢谢

以上是关于将 BeautifulSoup 元素解析为 Selenium的主要内容，如果未能解决你的问题，请参考以下文章