使用 Python selenium 查找 href 链接

Posted

技术标签:

【中文标题】使用 Python selenium 查找 href 链接【英文标题】:Finding href link with Python selenium 【发布时间】:2021-12-20 04:12:47 【问题描述】:

我正在做一个项目,用 selenium、python 导航和抓取网站。

hemnet_path = "https://www.hemnet.se/"
driver = webdriver.Chrome(path, chrome_options=options)
driver.get(hemnet_path)

脚本导航到this site 并循环访问所有列表以通过加载如下结果元素来获取价格、位置、超链接等:

result = driver.find_elements(By.CLASS_NAME,"sold-property-listing")

然后它会读取如下信息:

for r in result:

    try:
        location = r.find_element(By.CLASS_NAME,"sold-property-listing__location")
    except:
        location = 'Missing'

我遇到的问题是超链接字符串(屏幕截图)不像其他元素那样“表现”。

我似乎无法按类或 CSS 选择器和 href 进行选择,因为我收到以下错误:

我已成功使用 XPATH 获取链接,如下所示:

但问题是,当我遍历结果时,XPATH 总是引用同一个广告,在本例中为 2。我考虑过在 XPATH 中循环整数,但结果证明这是非常不可靠的。

我不确定我在这里遗漏了什么以及为什么会收到这个奇怪的错误。

对此的任何帮助将不胜感激。 下面提供了 ... 元素的 html

<li class="sold-results__normal-hit">
        <a class="hcl-card" data-target-blank="true" href="https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-26-1492088" rel="noopener" target="_blank">
  <div class="sold-property-listing">
    <div class="sold-property-listing__location">
      <h2 class="sold-property-listing__heading">
        <span class="property-icon property-icon--result"><svg   viewBox="0 0 14 16" xmlns="http://www.w3.org/2000/svg"><title>Lägenhet</title><desc><span class="svg-icon__fallback-text">Lägenhet</span></desc><path class="svg-icon__shape" d="M0 1.333v13.334C0 15.403.597 16 1.333 16h4v-5.333H8V16h4c.737 0 1.333-.597 1.333-1.333V1.333C13.333.597 12.737 0 12 0H1.333C.597 0 0 .597 0 1.333" fill="#C1569D" fill-rule="evenodd"></path></svg>
</span>
        <span class="item-result-meta-attribute-is-bold item-link qa-selling-price-title">Karlslundsvägen 26</span>
      </h2>

      <div>
        <span class="hide-element">
          Lägenhet
        </span>
          <span class="item-link">
            Barkarbystaden,
</span>        Järfälla kommun
      </div>
    </div>

    <div class="sold-property-listing__size">
        <div class="clear-children">
          <div class="sold-property-listing__subheading sold-property-listing--left">
              78&nbsp;m²
            &nbsp;
              3&nbsp;rum
          </div>

            <div class="sold-property-listing__fee">
              4&nbsp;233&nbsp;kr/mån
            </div>
        </div>


    </div>

    <div class="sold-property-listing__price">
      <div class="clear-children">
        <span class="sold-property-listing__subheading sold-property-listing--left">
            Slutpris 3&nbsp;295&nbsp;000&nbsp;kr
</span>      </div>

      <div class="clear-children">
          <div class="sold-property-listing__sold-date sold-property-listing--left">
            Såld 5 november 2021
          </div>

          <div class="sold-property-listing__price-per-m2 sold-property-listing--left">
            42&nbsp;244&nbsp;kr/m²
          </div>
      </div>
    </div>

      <div class="sold-property-listing__price-change">
        ±0&nbsp;%
      </div>

    <div class="sold-property-listing__broker">
      Länsförsäkringar Fastighetsförmedling Järfälla
    </div>

      <div class="sold-property-listing__labels">
        <div class="hcl-labels-list hcl-labels-list--row-direction">
            <span class="hcl-labels-list__label-item">
              <span class="hcl-label hcl-label--feature hcl-label--on-white-background">
                Uteplats
              </span>
            </span>
            <span class="hcl-labels-list__label-item">
              <span class="hcl-label hcl-label--feature hcl-label--on-white-background">
                Hiss
              </span>
            </span>
        </div>
      </div>
  </div>

</a>
    </li>

【问题讨论】:

【参考方案1】:

有一个接受 cookie 按钮,您必须先单击它。

其次,您需要使用下面的 CSS 定位所有锚标签,然后滚动到每个标签以让 selenium 查看它。

代码:

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)

driver.get("https://www.hemnet.se/salda/bostader?location_ids%5B%5D=473241&item_types%5B%5D=bostadsratt")
try:
    cookies = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.consent__button-wrapper .hcl-button--primary")))
    driver.execute_script("arguments[0].scrollIntoView(true);", cookies)
    driver.execute_script("arguments[0].click();", cookies)
except:
    print("Could not click")
    driver.quit()
    pass

all_hrefs = driver.find_elements(By.CSS_SELECTOR, "a.hcl-card")

for href in all_hrefs:
    driver.execute_script("arguments[0].scrollIntoView(true);", href)
    link = href.get_attribute('href')
    print(link)

进口:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

输出:

https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-26-1492088
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-fanriksvagen-41c-1490860
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-flygfaltsvagen-19-1490628
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-gripengatan-11,-van-2-1492409
https://www.hemnet.se/salda/lagenhet-2rum-barkarby-jarfalla-kommun-karlslundsvagen-24-1488804
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-kalvshallavagen-42-1488444
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-drakengatan-5-1488052
https://www.hemnet.se/salda/lagenhet-4rum-jarfalla-barkarbystaden-jarfalla-kommun-kalvshallavagen-36-1488008
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-barkarbyvagen-34-1487737
https://www.hemnet.se/salda/lagenhet-3rum-barkarby-jarfalla-kommun-flygarvagen-31-1487734
https://www.hemnet.se/salda/lagenhet-1rum-barkarby-jarfalla-kommun-jaktvagen-2-1487278
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-viggengatan-8-1487099
https://www.hemnet.se/salda/lagenhet-5rum-jarfalla-barkarbystaden-jarfalla-kommun-karlslundsvagen-3,-4-tr-1485991
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-jarfalla-kommun-barkarbyvagen-69-1485845
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-flygfaltsvagen-23-1485339
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-barkarbyvagen-36-1485338
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-majorsvagen-16-1484118
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-parkvagen-8,-2tr,-1482755
https://www.hemnet.se/salda/lagenhet-3rum-barkarby-jarfalla-kommun-flygarvagen-43-1482418
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-kalvshallavagen-32-1482219
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-barkarbyvagen-66-1481399
https://www.hemnet.se/salda/lagenhet-4rum-barkarbystaden-jarfalla-kommun-barkarbyvagen-34-1480551
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-parkvagen-8-1479755
https://www.hemnet.se/salda/lagenhet-3,5rum-jarfalla-barkarbystaden-jarfalla-kommun-flygfaltsvagen-19-1478682
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-kalvshallavagen-30-1477736
https://www.hemnet.se/salda/lagenhet-4rum-barkarby-jarfalla-kommun-skalbyvagen-17-1477210
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-jarfalla-kommun-viggengatan-11-1477547
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-barkarbystaden-jarfalla-kommun-majorsvagen-18-1478670
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-viggengatan-11-1476409
https://www.hemnet.se/salda/lagenhet-1rum-barkarbystaden-jarfalla-kommun-drakengatan-10-1475228
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-gripengatan-20-1474602
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-barkarbystaden-jarfalla-kommun-drakengatan-3,-2tr-1473715
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-viggengatan-10-1472189
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-12-1472092
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-kalvshallavagen-44-van-4-4-1471582
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-17-1471427
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-parkvagen-8,-3tr,-1471250
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-26-utan-balkong-1471234
https://www.hemnet.se/salda/lagenhet-3rum-barkarby-jarfalla-kommun-flygarvagen-33-1471126
https://www.hemnet.se/salda/lagenhet-3rum-jarfalla-barkarbystaden-jarfalla-kommun-parkvagen-5-1471961
https://www.hemnet.se/salda/lagenhet-4rum-barkarbystaden-jarfalla-kommun-kalvshallavagen-36-1468708
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-majorsvagen-18-1467460
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-majorsvagen-16-1467443
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-barkarbystaden-jarfalla-kommun-parkvagen-8-1467324
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-barkarby-jarfalla-kommun-pilotvagen-11-1465674
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-viggengatan-12-1465518
https://www.hemnet.se/salda/lagenhet-2rum-barkarby-jarfalla-kommun-attackvagen-4-1465147
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-viggengatan-8-1463893
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-18-1461358
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-viggengatan-10,-van-4-1460508

【讨论】:

当我尝试执行这一行时,我得到了与 OP 中显示的相同的错误: all_hrefs = driver.find_elements(By.CSS_SELECTOR, "a.hcl-card") 我猜这不是语法问题。编辑:没关系。当我在调试模式下执行它时,它起作用了。非常感谢! :) 你能在评论部分以文本格式分享错误堆栈跟踪吗?此外,正如您在上面的答案中看到的那样,我可以获得所有链接,所以理想情况下它也应该对您有用。

以上是关于使用 Python selenium 查找 href 链接的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 Selenium 和 Python 在元素中查找元素?

如何在 Selenium 和 Python 中使用类型查找元素

使用 Selenium 和 Python 和 Safari 查找 CVV

Selenium web 自动化使用 python:如何使用 selenium 处理表以通过匹配文本来查找特定行并删除该行

Python爬虫编程思想(96):使用Selenium查找多个节点

Python爬虫编程思想(95):使用Selenium查找单个节点