使用 XPath 和 Selenium 定位类的特定实例

Posted

技术标签:

【中文标题】使用 XPath 和 Selenium 定位类的特定实例【英文标题】:Locating a specific instance of a class located using XPath with Selenium 【发布时间】:2022-01-05 08:24:49 【问题描述】:

我正在尝试使用 Selenium 单击每个元素(屏幕截图 1 中显示的每个容器)的 PDF 图标(屏幕截图 2 中所示)。

问题在于 PDF 图标的标识符是有限的,所以我只能使用 XPath 表达式按类来定位它们。在for elem in issues_numb: 语句的每次迭代中,脚本都会单击它在页面上找到的第一个 PDF 图标,因为它是与提供给脚本的 XPath 相关联的第一个元素。

有没有办法创建一个嵌套循环,让每个类的实例(文章标题)单击与之关联的另一个类的实例(PDF 图标)?所以对于第一篇文章,点击第一个PDF图标等...

HTML 代码:

<section aria-label="Metadata for Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Ar*** Sea" class="article-list-item-content-block ">
    <div class="title " data-ember-action="" data-ember-action-1069="1069">
        <div id="ember1070" class="ember-view"><a target="_blank" href="/libraries/1374/articles/504204400" id="ember1071" class="ember-view" tabindex="0"> Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Ar*** Sea
            </a>
        </div>
    </div>

    <!---->

    <div class="metadata">

        <!---->

        <span tabindex="0" class="pages ">
            p. 489
        </span>

        <!---->

        <span class="authors" data-ember-action="" data-ember-action-1082="1082">
            <span tabindex="0" class="preview tabindex">
                Iqbal, Sajid; Vohra, Muhammad Sufyan; Janjua, Hussnain Ahmed
            </span>
        </span>

        <div class="abstract" data-ember-action="" data-ember-action-1083="1083">
            <div tabindex="0" class="preview tabindex">
                <div id="ember1088" class="ember-view">
                    <span class="lt-line-clamp__line">In the current study, strain MW-6 isolated from Ar*** seawater exhibited broad-spectrum antibacterial activity</span>
                   <span class="lt-line-clamp__line">against indicator bacterial pathogens. The partially extracted antibacterial metabolites with ethyl acetate revealed</span>
                   <span class="lt-line-clamp__line lt-line-clamp__line--last">
                       promising activity against, and. The minimum inhibitory concentrations (MICs) were determined against indicator stra<span class="lt-line-clamp__ellipsis"><div class="lt-line-clamp__dummy-element">…</div>

                       <!---->
                    </span></span>

                    <!----><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">…</span></div>
                    </div>
                </div>
            </div>

            <!---->

            <div class="content-overflow " data-ember-action="" data-ember-action-1089="1089">
                <span class="chevron icon flaticon solid down-2"></span>
            </div>

            <div class="tools ">
              <div class="buttons noselect">
                    <div class="button invisible download-pdf" data-ember-action="" data-ember-action-1090="1090">
                        <div id="ember1091" class="ember-view"><a aria-label="Download PDF" target="_blank" href="/libraries/1374/articles/504204400/pdf" id="ember1092" class="tooltip ember-view" tabindex="0">
                            <span aria-hidden="true" class="icon fal fa-file-pdf"></span>
                            <span class="aria-hidden">Download PDF - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Ar*** Sea</span>
                        </a>
                    </div>
                </div>

                <div class="button invisible read-full-text" data-ember-action="" data-ember-action-1097="1097">
                    <div id="ember1098" class="ember-view"><a aria-label="Link to Article" target="_blank" href="/libraries/1374/articles/504204400" id="ember1099" class="tooltip ember-view" tabindex="0">
                        <span aria-hidden="true" class="icon fal fa-link"></span>
                        <span class="aria-hidden">Link to Article - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Ar*** Sea</span>
                    </a>
                </div>
            </div>

            <div class="button invisible add-to-my-articles" data-ember-action="" data-ember-action-1100="1100">
              <a aria-label="Save to My Articles" class="tabindex tooltip" tabindex="0">
                <span aria-hidden="true" class="icon fal fa-folder"></span>
                <span class="aria-hidden">Save to My Articles - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Ar*** Sea</span>
              </a>
            </div>

            <div class="button invisible citation-services" data-ember-action="" data-ember-action-2165="2165">
              <a tabindex="0" aria-label="Export Citation" class="tabindex tooltip">
                <span aria-hidden="true" class="icon fal fa-graduation-cap"></span>
                <span class="aria-hidden">Export Citation - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Ar*** Sea</span>
              </a>
            </div>

            <div class="button invisible social-media-services" data-ember-action="" data-ember-action-2166="2166">
              <a tabindex="0" aria-label="Share" class="tabindex tooltip">
                <span aria-hidden="true" class="icon fal fa-share-alt"></span>
                <span class="aria-hidden">Share - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Ar*** Sea</span>
              </a>
            </div>
        </div>
    </div>
</section>

我的代码:

issues_numb = driver.find_elements(By.XPATH, "//section[@class='article-list-item-content-block ']")
parent_tab = driver.current_window_handle


for elem in issues_numb:
    title_article = elem.get_attribute("aria-label")
    print(title_article[13:])
    try:
        check_buttons = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
        print("pdf object found for", str(elem))
        checking_size_buttons = len(str(check_buttons))
        if checking_size_buttons > 0:
            pdf_icon = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
            click_pdf = ActionChains(driver).move_to_element(pdf_icon).click(pdf_icon).perform()
            WebDriverWait(driver, timeout).until(element_present)
            check_need_to_sign_in()
            driver.switch_to.window(parent_tab)
        else:
            print("No PDF available")
    except NoSuchElementException:
        get_article_name()

issues_numb 变量引用此元素:

tools_box 变量引用此元素:

【问题讨论】:

跟随实例”是什么意思?是不是要在当前elem里面找到pdf按钮?如果是这样,请参阅:***.com/questions/24795198/get-all-child-elements(您当前正在整个文档中搜索)。 这能回答你的问题吗? How do I find elements inside of elements using Selenium with Python? 尝试在循环中使用 .// 而不是 //。 // 意味着它总是从根目录搜索,而不是上下文。这 。表示从上下文开始。 @double_wizz 这是我找到的一些文档:Find Element From Element 【参考方案1】:

解决这种情况的方法,即只能访问由多个元素共享的标识符(在我的例子中是由多个 PDF 图标共享的类名),是指定一个上下文看。

这样,驱动程序将只查看与您所追求的特定搜索区域相关的 html 代码。更多关于这个here。 Here 也是如此,但从那时起 Selenium 的正确语法发生了变化。这是语法是更新版本:

elements = driver.find_elements(By.XPATH, "//tag['targeted_context']")
for elem in elements:
    targeted_element = elem.find_element(By.XPATH,".//tag[@class='targeted_class']")

(@AbdulAzizBarkat 在 cmets 中的回答。)

【讨论】:

【参考方案2】:

当您使用双斜杠 (//) 启动 XPath 表达式时,引擎开始从根目录开始查找内容中的所有位置。

因此,您应该通过在// 前面添加. 来更改循环内的XPath 表达式。这样,您告诉引擎使用当前上下文而不是根。

只是给你一个想法,你的代码应该是这样的。

顺便说一句:分享实际的 HTML 内容是一种很好的做法,这样你的代码和问题就更容易理解了。

issues_numb = driver.find_elements(By.XPATH, "//div[@class='issue ember-view']")

for elem in issues_numb:

    button = ActionChains(driver).move_to_element(elem).click(elem).perform()
    check_buttons = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
    checking_size_buttons = len(str(check_buttons))
    if checking_size_buttons > 0:
        tools_box = driver.find_elements(By.XPATH, ".//div[@class='buttons noselect']")

        for box in tools_box:
            element_present = EC.presence_of_element_located((By.XPATH, ".//span[@class='icon fal fa-file-pdf']"))
            WebDriverWait(driver, timeout).until(element_present)
            pdf_icon = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
            parent_tab = driver.current_window_handle
            click_pdf = ActionChains(driver).move_to_element(pdf_icon).click(pdf_icon).perform()
            time.sleep(10)
            print(driver.current_url)
            check_need_to_sign_in()
            driver.switch_to.window(parent_tab)

【讨论】:

能否请您查看问题的更新版本?我已经添加了 HTML 代码,希望可以让我的问题更清楚。

以上是关于使用 XPath 和 Selenium 定位类的特定实例的主要内容,如果未能解决你的问题,请参考以下文章

被误读的xpath

用XPath精确定位节点元素&selenium使用Xpath定位之完整篇

用XPath精确定位节点元素&selenium使用Xpath定位之完整篇

Selenium----Xpath的使用

selenium- Xpath的详细使用

selenium自动化之xpath定位*必会技能*