使用 requests.get 或 webdriver.get 时无法获取所有 HTML
Posted
技术标签:
【中文标题】使用 requests.get 或 webdriver.get 时无法获取所有 HTML【英文标题】:Cannot get all HTML when using requests.get or webdriver.get 【发布时间】:2022-01-04 06:52:27 【问题描述】:我正在尝试从style
属性中抓取100.0%
:
<div class="w-full mt-1 bg-white rounded-lg shadow">
<div class="py-1 bg-purple-900 rounded-lg" style="width: 100.0%"></div>
</div>
页面源响应没有深入到页面中,我试过了:
def ScrapePercents():
URL = "https://citystrides.com/cities/26013/search_striders?page=1"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find("div", class_="flex flex-wrap space-y-4")
percents = results.find_all("div", class_="py-1 bg-purple-900 rounded-lg")
pl = []
for percent in percents:
cleantext = percent['style'].lstrip('width: ')
percent_neat = (cleantext.strip('%'))
percent_float = float(percent_neat)
pl.append(percent_float)
print("pl as it appends ", pl)
return pl
和
def Selenium():
print("shell for selenium")
pl = "shell for selenium percents"
driver = webdriver.Chrome("chromedriver.exe")
driver.get("https://citystrides.com/cities/26013/search_striders")
content = driver.page_source
soup = BeautifulSoup(content)
results = soup.find("div", class_="flex flex-wrap space-y-4")
return soup
设置第二个代码只是为了查看内容是否包含嵌套的div
。我正在努力弄清楚如何获得嵌套的div
。
【问题讨论】:
【参考方案1】:使用Selenium提取你需要诱导WebDriverWait为visibility_of_element_located()
的DIV内容,可以使用以下Locator Strategies之一:
使用CSS_SELECTOR
:
driver.get("https://citystrides.com/cities/26013/search_striders")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".flex.flex-wrap.space-y-4"))).get_attribute("innerHTML"))
使用XPATH
:
driver.get("https://citystrides.com/cities/26013/search_striders")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='flex flex-wrap space-y-4']"))).get_attribute("innerHTML"))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
控制台输出:
<div id="user_10277" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
<div class="flex-grow text-left">
<h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Pedro Queiroz</h2>
<div class="text-xs uppercase">5233 streets</div>
</div>
</div>
<div class="flex justify-between mt-5 text-sm">
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/10277/map">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
LifeMap
</a> </div>
<div title="Pedro Queiroz is a Supporter">?</div>
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/10277">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
Profile
</a> </div>
</div>
</div>
<div id="user_18066" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
<div class="flex">
<img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2" src="https://www.gravatar.com/avatar/9f78d924bbc876075b775da81282d6d7?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">
<div class="flex-grow text-left">
<h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Ali G</h2>
<div class="text-xs uppercase">2848 streets</div>
</div>
</div>
<div class="flex justify-between mt-5 text-sm">
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/18066/map">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
LifeMap
</a> </div>
<div title="Ali G is a Supporter">?</div>
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/18066">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
Profile
</a> </div>
</div>
</div>
<div id="user_24203" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
<div class="flex">
<img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2" src="https://www.gravatar.com/avatar/1359108e2ffc9fefe8876939f32a969f?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">
<div class="flex-grow text-left">
<h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Brad Windon</h2>
<div class="text-xs uppercase">2784 streets</div>
</div>
</div>
<div class="flex justify-between mt-5 text-sm">
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/24203/map">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
LifeMap
</a> </div>
<div title="Brad Windon is a Supporter">?</div>
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/24203">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
Profile
</a> </div>
</div>
</div>
<div id="user_37225" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
<div class="flex">
<img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2" src="https://www.gravatar.com/avatar/e4e29c4688468497e59a99a867d1d27e?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">
<div class="flex-grow text-left">
<h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Randy Adams</h2>
<div class="text-xs uppercase">3350 streets</div>
</div>
</div>
<div class="flex justify-between mt-5 text-sm">
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/37225/map">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
LifeMap
</a> </div>
<div title="Randy Adams is a Supporter">?</div>
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/37225">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
Profile
</a> </div>
</div>
</div>
<div id="user_29373" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
<div class="flex">
<img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2" src="https://www.gravatar.com/avatar/f372f7d189b34ee38d4304d34d0e92d8?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">
<div class="flex-grow text-left">
<h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Nudibranch Whisperer</h2>
<div class="text-xs uppercase">2451 streets</div>
</div>
</div>
<div class="flex justify-between mt-5 text-sm">
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/29373/map">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
LifeMap
</a> </div>
<div title="Nudibranch Whisperer is a Supporter">?</div>
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/29373">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
Profile
</a> </div>
</div>
</div>
<div id="user_39377" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
<div class="flex">
<img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2" src="https://www.gravatar.com/avatar/d22b9542672cb8c7fe062fa0d05663df?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">
<div class="flex-grow text-left">
<h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Nathan Moas</h2>
<div class="text-xs uppercase">2360 streets</div>
</div>
</div>
<div class="flex justify-between mt-5 text-sm">
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/39377/map">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
LifeMap
</a> </div>
<div title="Nathan Moas is a Supporter">?</div>
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/39377">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
Profile
</a> </div>
</div>
</div>
<div id="user_39306" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
<div class="flex">
<img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2" src="https://www.gravatar.com/avatar/bf78feece98a5aa399ffbc8167e195b0?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">
<div class="flex-grow text-left">
<h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Conrad Bajkowski</h2>
<div class="text-xs uppercase">1574 streets</div>
</div>
</div>
<div class="flex justify-between mt-5 text-sm">
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/39306/map">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
LifeMap
</a> </div>
<div title="Conrad Bajkowski is a Supporter">?</div>
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/39306">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
Profile
</a> </div>
</div>
</div>
<div class="w-full">
<div class="flow-root">
<div class="relative">
<div class="relative flex items-start space-x-3">
<div>
<div class="relative">
<div class="flex items-center justify-center w-8 h-8 bg-gray-100 rounded-full ring-8 ring-white">
<svg class="w-5 h-5 text-blue-500" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true">
<path fill-rule="evenodd" d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-6-3a2 2 0 11-4 0 2 2 0 014 0zm-2 4a5 5 0 00-4.546 2.916A5.986 5.986 0 0010 16a5.986 5.986 0 004.546-2.084A5 5 0 0010 11z" clip-rule="evenodd"></path>
</svg>
</div>
</div>
</div>
<div class="min-w-0 flex-1 py-1.5">
<div class="text-sm text-gray-500">
<span class="font-medium text-gray-900">A private Strider</span>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="w-full">
<div class="flow-root">
<div class="relative">
<div class="relative flex items-start space-x-3">
<div>
<div class="relative">
<div class="flex items-center justify-center w-8 h-8 bg-gray-100 rounded-full ring-8 ring-white">
<svg class="w-5 h-5 text-blue-500" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true">
<path fill-rule="evenodd" d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-6-3a2 2 0 11-4 0 2 2 0 014 0zm-2 4a5 5 0 00-4.546 2.916A5.986 5.986 0 0010 16a5.986 5.986 0 004.546-2.084A5 5 0 0010 11z" clip-rule="evenodd"></path>
</svg>
</div>
</div>
</div>
<div class="min-w-0 flex-1 py-1.5">
<div class="text-sm text-gray-500">
<span class="font-medium text-gray-900">A private Strider</span>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="w-full">
<div class="flow-root">
<div class="relative">
<div class="relative flex items-start space-x-3">
<div>
<div class="relative">
<div class="flex items-center justify-center w-8 h-8 bg-gray-100 rounded-full ring-8 ring-white">
<svg class="w-5 h-5 text-blue-500" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true">
<path fill-rule="evenodd" d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-6-3a2 2 0 11-4 0 2 2 0 014 0zm-2 4a5 5 0 00-4.546 2.916A5.986 5.986 0 0010 16a5.986 5.986 0 004.546-2.084A5 5 0 0010 11z" clip-rule="evenodd"></path>
</svg>
</div>
</div>
</div>
<div class="min-w-0 flex-1 py-1.5">
<div class="text-sm text-gray-500">
<span class="font-medium text-gray-900">A private Strider</span>
</div>
</div>
</div>
</div>
</div>
</div>
<div id="user_31601" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
<div class="flex">
<img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2" src="https://www.gravatar.com/avatar/17a534745e70297382cc25b22903e611?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">
<div class="flex-grow text-left">
<h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Elliot Nolan</h2>
<div class="text-xs uppercase">1611 streets</div>
</div>
</div>
<div class="flex justify-between mt-5 text-sm">
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/31601/map">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
LifeMap
</a> </div>
<div>
<a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/31601">
<svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
Profile
</a> </div>
</div>
</div>
<div class="w-full">
<div class="flow-root">
<div class="relative">
<div class="relative flex items-start space-x-3">
<div>
<div class="relative">
<div class="flex items-center justify-center w-8 h-8 bg-gray-100 rounded-full ring-8 ring-white">
<svg class="w-5 h-5 text-blue-500" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true">
<path fill-rule="evenodd" d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-6-3a2 2 0 11-4 0 2 2 0 014 0zm-2 4a5 5 0 00-4.546 2.916A5.986 5.986 0 0010 16a5.986 5.986 0 004.546-2.084A5 5 0 0010 11z" clip-rule="evenodd"></path>
</svg>
</div>
</div>
</div>
<div class="min-w-0 flex-1 py-1.5">
<div class="text-sm text-gray-500">
<span class="font-medium text-gray-900">A private Strider</span>
</div>
</div>
</div>
</div>
</div>
</div>
【讨论】:
输出仍然不包括页面源到迭代 find.all 以获取百分比数字所需的深度。我正在尝试使用:results = soup.find("div", class_="w-full mt-1 bg-white rounded-lg shadow")
- 每个用户都包含一个百分比数字。但是对于页面源的初始获取请求,无论是使用 BS 还是 Selenium,都没有深入到 HTML 中。
如果您仍然遇到任何问题,请随时在Selenium room 讨论问题。
很遗憾我还没有足够的声望。无论如何,谢谢。【参考方案2】:
你正在寻找的元素/div 类 w-full 和 py-1 实际上不在网站的源代码中,我只是试图检查并且找不到它们,当您在浏览器上打开该网站时能看到它们吗?
【讨论】:
是的。谢谢你的纠正。我也使用了citystrides.com/cities/26013,即使那样仍然无法访问请求中的这些元素。我不确定为什么请求没有拉出更深的嵌套部分。以上是关于使用 requests.get 或 webdriver.get 时无法获取所有 HTML的主要内容,如果未能解决你的问题,请参考以下文章