使用 Selenium 和 Beautifulsoup 解析 Airdna 地图悬停在文本上

Posted

技术标签:

【中文标题】使用 Selenium 和 Beautifulsoup 解析 Airdna 地图悬停在文本上【英文标题】:Parse Airdna map hover over text using Selenium & Beautifulsoup 【发布时间】:2021-11-26 05:51:04 【问题描述】:

我正在尝试从将鼠标悬停在地图视图中的标记上时出现的窗口中抓取数据,并从窗口中抓取“可用天数”值。

image of text which I am trying to scrape

我正在努力使用 python、webdriver 和 BeautifulSoup 将地图视图中的所有紫色标记一一悬停。我设法编写了以下代码,但 mapMarkers 变量始终为空。

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.airdna.co/vacation-rental-data/app/us/california/santa-monica/overview")

                                                                                                        
input("Press Enter to continue...")  # wait until page loads and tutorial is closed

mapMarkers = driver.find_elements_by_class_name("Page__RightColumn-sc-291lxm-3")  # get a list of marker element

target_list = []

for i in range(len(mapMarkers)):
    mapMarkers[i].click() # click to appear hover over window
    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")

    days = soup.find_all("p", "class": ['info-window__statistics-value'])
    link   = soup.find_all("a", "class": ['info-window__property-link'])
    target_list.append(  
        days[0].text.replace('\n', '').replace(' ', ''), 
        link[0].attrs['href'] 
     )


driver.quit()

This is the link to the website.

【问题讨论】:

【参考方案1】:

有些网站使用私有 API 来获取他们的数据,而您的网站就是其中之一 要获取 API 数据,您需要 inspect network activity。

右键单击页面并单击 Inspect 以打开 DevTools。转到网络点击并搜索 API,然后单击预览以查看内容。

右击然后复制 curl 然后使用这个site将命令翻译成python

您的代码将如下所示:

import requests

headers = 
    'authority': 'api.airdna.co',
    'sec-ch-ua': '"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"',
    'sec-ch-ua-mobile': '?0',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36',
    'sec-ch-ua-platform': '"Windows"',
    'accept': '*/*',
    'origin': 'https://www.airdna.co',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.airdna.co/',
    'accept-language': 'en-US,en;q=0.9',


params = (
    ('access_token', 'MjkxMTI|8b0178bf0e564cbf96fc75b8518a5375'),
    ('city_id', '59053'),
    ('start_month', '10'),
    ('start_year', '2018'),
    ('number_of_months', '36'),
    ('currency', 'native'),
    ('show_regions', 'true'),
)

response = requests.get('https://api.airdna.co/v1/market/property_list', headers=headers, params=params)

results = response.json()["properties"]

for result in results[0:20]:
    title = result["title"]
    days_available = result["days_available"]
    print (f"title : days_available")
 

结果:

Panoramic Ocean View Studio Loft : 274
Private 1906 Bungalow : 364
Serene Garden Room by the Beach!!! : 188
Bright New Beachside Master Suite : 171
Bright New Beachside Bedroom : 164
Pvt bedroom-pvt bath & entryway. Ocean front Views : 155
Elegant Design Apartment with Courtyard Garden Dining Space : 224
Liz''s Beachy Retreat in Santa Monica! : 55
Santa Monica One BedRoom Apt.(Ocean Breeze B) : 26
ROOM & BATH. 4 BLOCKS TO OCEAN. N OF WILSHIRE. : 114
Comfy Room - Amazing Location! : 178
Steps to Beach in Gorgeous Suite! : 84
Stunning Three Bedroom Santa Monica Beach Home : 224
Santa Monica with parking/Montana close to beach : 156
PRIVATE ROOM W/BR IN SANTA MONICA : 293
Private Room with Bathroom at Beach :)Just Perfect : 334
Newly Furnished! 1 Bed Beach Condo : 45
Santa Monica Beach House!Prime area : 264
Santa Monica Canyon Pied-a-Terre : 355
Santa Monica Beach Suite 5 : 276

【讨论】:

非常感谢!这比我想要的要好。 不客气 :)

以上是关于使用 Selenium 和 Beautifulsoup 解析 Airdna 地图悬停在文本上的主要内容,如果未能解决你的问题,请参考以下文章

python中技巧

python爬虫之BeautifulSoup

Python项目之requirements文件

爬取所有校园新闻

使用 selenium TestNG 和 selenium 进行并行测试

ImportError:无法导入名称“BeautifulSoup4”