使用 Python selenium 查找 href 链接
Posted
技术标签:
【中文标题】使用 Python selenium 查找 href 链接【英文标题】:Finding href link with Python selenium 【发布时间】:2021-12-20 04:12:47 【问题描述】:我正在做一个项目,用 selenium、python 导航和抓取网站。
hemnet_path = "https://www.hemnet.se/"
driver = webdriver.Chrome(path, chrome_options=options)
driver.get(hemnet_path)
脚本导航到this site 并循环访问所有列表以通过加载如下结果元素来获取价格、位置、超链接等:
result = driver.find_elements(By.CLASS_NAME,"sold-property-listing")
然后它会读取如下信息:
for r in result:
try:
location = r.find_element(By.CLASS_NAME,"sold-property-listing__location")
except:
location = 'Missing'
我遇到的问题是超链接字符串(屏幕截图)不像其他元素那样“表现”。
我似乎无法按类或 CSS 选择器和 href 进行选择,因为我收到以下错误:
我已成功使用 XPATH 获取链接,如下所示:
但问题是,当我遍历结果时,XPATH 总是引用同一个广告,在本例中为 2。我考虑过在 XPATH 中循环整数,但结果证明这是非常不可靠的。
我不确定我在这里遗漏了什么以及为什么会收到这个奇怪的错误。
对此的任何帮助将不胜感激。 下面提供了 ... 元素的 html。
<li class="sold-results__normal-hit">
<a class="hcl-card" data-target-blank="true" href="https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-26-1492088" rel="noopener" target="_blank">
<div class="sold-property-listing">
<div class="sold-property-listing__location">
<h2 class="sold-property-listing__heading">
<span class="property-icon property-icon--result"><svg viewBox="0 0 14 16" xmlns="http://www.w3.org/2000/svg"><title>Lägenhet</title><desc><span class="svg-icon__fallback-text">Lägenhet</span></desc><path class="svg-icon__shape" d="M0 1.333v13.334C0 15.403.597 16 1.333 16h4v-5.333H8V16h4c.737 0 1.333-.597 1.333-1.333V1.333C13.333.597 12.737 0 12 0H1.333C.597 0 0 .597 0 1.333" fill="#C1569D" fill-rule="evenodd"></path></svg>
</span>
<span class="item-result-meta-attribute-is-bold item-link qa-selling-price-title">Karlslundsvägen 26</span>
</h2>
<div>
<span class="hide-element">
Lägenhet
</span>
<span class="item-link">
Barkarbystaden,
</span> Järfälla kommun
</div>
</div>
<div class="sold-property-listing__size">
<div class="clear-children">
<div class="sold-property-listing__subheading sold-property-listing--left">
78 m²
3 rum
</div>
<div class="sold-property-listing__fee">
4 233 kr/mån
</div>
</div>
</div>
<div class="sold-property-listing__price">
<div class="clear-children">
<span class="sold-property-listing__subheading sold-property-listing--left">
Slutpris 3 295 000 kr
</span> </div>
<div class="clear-children">
<div class="sold-property-listing__sold-date sold-property-listing--left">
Såld 5 november 2021
</div>
<div class="sold-property-listing__price-per-m2 sold-property-listing--left">
42 244 kr/m²
</div>
</div>
</div>
<div class="sold-property-listing__price-change">
±0 %
</div>
<div class="sold-property-listing__broker">
Länsförsäkringar Fastighetsförmedling Järfälla
</div>
<div class="sold-property-listing__labels">
<div class="hcl-labels-list hcl-labels-list--row-direction">
<span class="hcl-labels-list__label-item">
<span class="hcl-label hcl-label--feature hcl-label--on-white-background">
Uteplats
</span>
</span>
<span class="hcl-labels-list__label-item">
<span class="hcl-label hcl-label--feature hcl-label--on-white-background">
Hiss
</span>
</span>
</div>
</div>
</div>
</a>
</li>
【问题讨论】:
【参考方案1】:有一个接受 cookie 按钮,您必须先单击它。
其次,您需要使用下面的 CSS 定位所有锚标签,然后滚动到每个标签以让 selenium 查看它。
代码:
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)
driver.get("https://www.hemnet.se/salda/bostader?location_ids%5B%5D=473241&item_types%5B%5D=bostadsratt")
try:
cookies = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.consent__button-wrapper .hcl-button--primary")))
driver.execute_script("arguments[0].scrollIntoView(true);", cookies)
driver.execute_script("arguments[0].click();", cookies)
except:
print("Could not click")
driver.quit()
pass
all_hrefs = driver.find_elements(By.CSS_SELECTOR, "a.hcl-card")
for href in all_hrefs:
driver.execute_script("arguments[0].scrollIntoView(true);", href)
link = href.get_attribute('href')
print(link)
进口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
输出:
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-26-1492088
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-fanriksvagen-41c-1490860
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-flygfaltsvagen-19-1490628
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-gripengatan-11,-van-2-1492409
https://www.hemnet.se/salda/lagenhet-2rum-barkarby-jarfalla-kommun-karlslundsvagen-24-1488804
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-kalvshallavagen-42-1488444
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-drakengatan-5-1488052
https://www.hemnet.se/salda/lagenhet-4rum-jarfalla-barkarbystaden-jarfalla-kommun-kalvshallavagen-36-1488008
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-barkarbyvagen-34-1487737
https://www.hemnet.se/salda/lagenhet-3rum-barkarby-jarfalla-kommun-flygarvagen-31-1487734
https://www.hemnet.se/salda/lagenhet-1rum-barkarby-jarfalla-kommun-jaktvagen-2-1487278
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-viggengatan-8-1487099
https://www.hemnet.se/salda/lagenhet-5rum-jarfalla-barkarbystaden-jarfalla-kommun-karlslundsvagen-3,-4-tr-1485991
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-jarfalla-kommun-barkarbyvagen-69-1485845
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-flygfaltsvagen-23-1485339
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-barkarbyvagen-36-1485338
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-majorsvagen-16-1484118
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-parkvagen-8,-2tr,-1482755
https://www.hemnet.se/salda/lagenhet-3rum-barkarby-jarfalla-kommun-flygarvagen-43-1482418
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-kalvshallavagen-32-1482219
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-barkarbyvagen-66-1481399
https://www.hemnet.se/salda/lagenhet-4rum-barkarbystaden-jarfalla-kommun-barkarbyvagen-34-1480551
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-parkvagen-8-1479755
https://www.hemnet.se/salda/lagenhet-3,5rum-jarfalla-barkarbystaden-jarfalla-kommun-flygfaltsvagen-19-1478682
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-kalvshallavagen-30-1477736
https://www.hemnet.se/salda/lagenhet-4rum-barkarby-jarfalla-kommun-skalbyvagen-17-1477210
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-jarfalla-kommun-viggengatan-11-1477547
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-barkarbystaden-jarfalla-kommun-majorsvagen-18-1478670
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-viggengatan-11-1476409
https://www.hemnet.se/salda/lagenhet-1rum-barkarbystaden-jarfalla-kommun-drakengatan-10-1475228
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-gripengatan-20-1474602
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-barkarbystaden-jarfalla-kommun-drakengatan-3,-2tr-1473715
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-viggengatan-10-1472189
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-12-1472092
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-kalvshallavagen-44-van-4-4-1471582
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-17-1471427
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-parkvagen-8,-3tr,-1471250
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-26-utan-balkong-1471234
https://www.hemnet.se/salda/lagenhet-3rum-barkarby-jarfalla-kommun-flygarvagen-33-1471126
https://www.hemnet.se/salda/lagenhet-3rum-jarfalla-barkarbystaden-jarfalla-kommun-parkvagen-5-1471961
https://www.hemnet.se/salda/lagenhet-4rum-barkarbystaden-jarfalla-kommun-kalvshallavagen-36-1468708
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-majorsvagen-18-1467460
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-majorsvagen-16-1467443
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-barkarbystaden-jarfalla-kommun-parkvagen-8-1467324
https://www.hemnet.se/salda/lagenhet-2rum-jarfalla-barkarby-jarfalla-kommun-pilotvagen-11-1465674
https://www.hemnet.se/salda/lagenhet-3rum-barkarbystaden-jarfalla-kommun-viggengatan-12-1465518
https://www.hemnet.se/salda/lagenhet-2rum-barkarby-jarfalla-kommun-attackvagen-4-1465147
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-viggengatan-8-1463893
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-karlslundsvagen-18-1461358
https://www.hemnet.se/salda/lagenhet-2rum-barkarbystaden-jarfalla-kommun-viggengatan-10,-van-4-1460508
【讨论】:
当我尝试执行这一行时,我得到了与 OP 中显示的相同的错误: all_hrefs = driver.find_elements(By.CSS_SELECTOR, "a.hcl-card") 我猜这不是语法问题。编辑:没关系。当我在调试模式下执行它时,它起作用了。非常感谢! :) 你能在评论部分以文本格式分享错误堆栈跟踪吗?此外,正如您在上面的答案中看到的那样,我可以获得所有链接,所以理想情况下它也应该对您有用。以上是关于使用 Python selenium 查找 href 链接的主要内容,如果未能解决你的问题,请参考以下文章
如何使用 Selenium 和 Python 在元素中查找元素?
如何在 Selenium 和 Python 中使用类型查找元素
使用 Selenium 和 Python 和 Safari 查找 CVV
Selenium web 自动化使用 python:如何使用 selenium 处理表以通过匹配文本来查找特定行并删除该行