使用 requests.get 或 webdriver.get 时无法获取所有 HTML

Posted

技术标签:

【中文标题】使用 requests.get 或 webdriver.get 时无法获取所有 HTML【英文标题】:Cannot get all HTML when using requests.get or webdriver.get 【发布时间】:2022-01-04 06:52:27 【问题描述】:

我正在尝试从style 属性中抓取100.0%

<div class="w-full mt-1 bg-white rounded-lg shadow">
  <div class="py-1 bg-purple-900 rounded-lg" style="width: 100.0%"></div>
</div> 

页面源响应没有深入到页面中,我试过了:

def ScrapePercents():
    URL = "https://citystrides.com/cities/26013/search_striders?page=1"
    page = requests.get(URL)

    soup = BeautifulSoup(page.content, "html.parser")    
    results = soup.find("div", class_="flex flex-wrap space-y-4")    
    percents = results.find_all("div", class_="py-1 bg-purple-900 rounded-lg")

    pl = []

    for percent in percents:
        cleantext = percent['style'].lstrip('width: ')
        percent_neat = (cleantext.strip('%'))
        percent_float = float(percent_neat)
        pl.append(percent_float)
        print("pl as it appends ", pl)

    return pl

def Selenium():
    print("shell for selenium")
    pl = "shell for selenium percents"
    driver = webdriver.Chrome("chromedriver.exe")
    driver.get("https://citystrides.com/cities/26013/search_striders")
    content = driver.page_source

    soup = BeautifulSoup(content)
    results = soup.find("div", class_="flex flex-wrap space-y-4")

    return soup

设置第二个代码只是为了查看内容是否包含嵌套的div。我正在努力弄清楚如何获得嵌套的div

【问题讨论】:

【参考方案1】:

使用Selenium提取你需要诱导WebDriverWait为visibility_of_element_located()的DIV内容,可以使用以下Locator Strategies之一:

使用CSS_SELECTOR

driver.get("https://citystrides.com/cities/26013/search_striders")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".flex.flex-wrap.space-y-4"))).get_attribute("innerHTML"))

使用XPATH:

driver.get("https://citystrides.com/cities/26013/search_striders")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='flex flex-wrap space-y-4']"))).get_attribute("innerHTML"))

注意:您必须添加以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

控制台输出:

<div id="user_10277" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
    <div class="flex-grow text-left">
      <h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Pedro Queiroz</h2>

      <div class="text-xs uppercase">5233 streets</div>



    </div>
  </div>



  <div class="flex justify-between mt-5 text-sm">
    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/10277/map">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
    LifeMap
</a>    </div>

    <div title="Pedro Queiroz is a Supporter">?</div>


    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/10277">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
    Profile
</a>    </div>
  </div>
</div>

<div id="user_18066" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
  <div class="flex">
    <img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2"  src="https://www.gravatar.com/avatar/9f78d924bbc876075b775da81282d6d7?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">

    <div class="flex-grow text-left">
      <h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Ali G</h2>

      <div class="text-xs uppercase">2848 streets</div>



    </div>
  </div>



  <div class="flex justify-between mt-5 text-sm">
    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/18066/map">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
    LifeMap
</a>    </div>

    <div title="Ali G is a Supporter">?</div>


    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/18066">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
    Profile
</a>    </div>
  </div>
</div>

<div id="user_24203" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
  <div class="flex">
    <img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2"  src="https://www.gravatar.com/avatar/1359108e2ffc9fefe8876939f32a969f?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">

    <div class="flex-grow text-left">
      <h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Brad Windon</h2>

      <div class="text-xs uppercase">2784 streets</div>



    </div>
  </div>



  <div class="flex justify-between mt-5 text-sm">
    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/24203/map">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
    LifeMap
</a>    </div>

    <div title="Brad Windon is a Supporter">?</div>


    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/24203">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
    Profile
</a>    </div>
  </div>
</div>

<div id="user_37225" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
  <div class="flex">
    <img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2"  src="https://www.gravatar.com/avatar/e4e29c4688468497e59a99a867d1d27e?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">

    <div class="flex-grow text-left">
      <h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Randy Adams</h2>

      <div class="text-xs uppercase">3350 streets</div>



    </div>
  </div>



  <div class="flex justify-between mt-5 text-sm">
    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/37225/map">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
    LifeMap
</a>    </div>

    <div title="Randy Adams is a Supporter">?</div>


    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/37225">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
    Profile
</a>    </div>
  </div>
</div>

<div id="user_29373" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
  <div class="flex">
    <img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2"  src="https://www.gravatar.com/avatar/f372f7d189b34ee38d4304d34d0e92d8?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">

    <div class="flex-grow text-left">
      <h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Nudibranch Whisperer</h2>

      <div class="text-xs uppercase">2451 streets</div>



    </div>
  </div>



  <div class="flex justify-between mt-5 text-sm">
    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/29373/map">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
    LifeMap
</a>    </div>

    <div title="Nudibranch Whisperer is a Supporter">?</div>


    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/29373">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
    Profile
</a>    </div>
  </div>
</div>

<div id="user_39377" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
  <div class="flex">
    <img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2"  src="https://www.gravatar.com/avatar/d22b9542672cb8c7fe062fa0d05663df?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">

    <div class="flex-grow text-left">
      <h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Nathan Moas</h2>

      <div class="text-xs uppercase">2360 streets</div>



    </div>
  </div>



  <div class="flex justify-between mt-5 text-sm">
    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/39377/map">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
    LifeMap
</a>    </div>

    <div title="Nathan Moas is a Supporter">?</div>


    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/39377">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
    Profile
</a>    </div>
  </div>
</div>

<div id="user_39306" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
  <div class="flex">
    <img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2"  src="https://www.gravatar.com/avatar/bf78feece98a5aa399ffbc8167e195b0?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">

    <div class="flex-grow text-left">
      <h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Conrad Bajkowski</h2>

      <div class="text-xs uppercase">1574 streets</div>



    </div>
  </div>



  <div class="flex justify-between mt-5 text-sm">
    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/39306/map">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
    LifeMap
</a>    </div>

    <div title="Conrad Bajkowski is a Supporter">?</div>


    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/39306">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
    Profile
</a>    </div>
  </div>
</div>

<div class="w-full">
  <div class="flow-root">
    <div class="relative">
      <div class="relative flex items-start space-x-3">
    <div>
      <div class="relative">
        <div class="flex items-center justify-center w-8 h-8 bg-gray-100 rounded-full ring-8 ring-white">
          <svg class="w-5 h-5 text-blue-500" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true">
        <path fill-rule="evenodd" d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-6-3a2 2 0 11-4 0 2 2 0 014 0zm-2 4a5 5 0 00-4.546 2.916A5.986 5.986 0 0010 16a5.986 5.986 0 004.546-2.084A5 5 0 0010 11z" clip-rule="evenodd"></path>
          </svg>
        </div>
      </div>
    </div>

    <div class="min-w-0 flex-1 py-1.5">
      <div class="text-sm text-gray-500">
        <span class="font-medium text-gray-900">A private Strider</span>



      </div>
    </div>
      </div>
    </div>
  </div>
</div>

<div class="w-full">
  <div class="flow-root">
    <div class="relative">
      <div class="relative flex items-start space-x-3">
    <div>
      <div class="relative">
        <div class="flex items-center justify-center w-8 h-8 bg-gray-100 rounded-full ring-8 ring-white">
          <svg class="w-5 h-5 text-blue-500" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true">
        <path fill-rule="evenodd" d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-6-3a2 2 0 11-4 0 2 2 0 014 0zm-2 4a5 5 0 00-4.546 2.916A5.986 5.986 0 0010 16a5.986 5.986 0 004.546-2.084A5 5 0 0010 11z" clip-rule="evenodd"></path>
          </svg>
        </div>
      </div>
    </div>

    <div class="min-w-0 flex-1 py-1.5">
      <div class="text-sm text-gray-500">
        <span class="font-medium text-gray-900">A private Strider</span>



      </div>
    </div>
      </div>
    </div>
  </div>
</div>

<div class="w-full">
  <div class="flow-root">
    <div class="relative">
      <div class="relative flex items-start space-x-3">
    <div>
      <div class="relative">
        <div class="flex items-center justify-center w-8 h-8 bg-gray-100 rounded-full ring-8 ring-white">
          <svg class="w-5 h-5 text-blue-500" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true">
        <path fill-rule="evenodd" d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-6-3a2 2 0 11-4 0 2 2 0 014 0zm-2 4a5 5 0 00-4.546 2.916A5.986 5.986 0 0010 16a5.986 5.986 0 004.546-2.084A5 5 0 0010 11z" clip-rule="evenodd"></path>
          </svg>
        </div>
      </div>
    </div>

    <div class="min-w-0 flex-1 py-1.5">
      <div class="text-sm text-gray-500">
        <span class="font-medium text-gray-900">A private Strider</span>



      </div>
    </div>
      </div>
    </div>
  </div>
</div>

<div id="user_31601" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
  <div class="flex">
    <img loading="lazy" class="flex-none text-xs rounded-full h-16 w-16 mr-2"  src="https://www.gravatar.com/avatar/17a534745e70297382cc25b22903e611?d=https%3A%2F%2Fcitystrides.com%2Fassets%2Flogo_menu-4dc9d8eddd18724d2784165652c6ca07b47443b447fba88a84a4adc856d3605b.png">

    <div class="flex-grow text-left">
      <h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Elliot Nolan</h2>

      <div class="text-xs uppercase">1611 streets</div>



    </div>
  </div>



  <div class="flex justify-between mt-5 text-sm">
    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/31601/map">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
    LifeMap
</a>    </div>




    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/31601">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke- viewBox="0 0 24 24" stroke="currentColor"><path d="M16 7a4 4 0 11-8 0 4 4 0 018 0zM12 14a7 7 0 00-7 7h14a7 7 0 00-7-7z"></path></svg>
    Profile
</a>    </div>
  </div>
</div>

<div class="w-full">
  <div class="flow-root">
    <div class="relative">
      <div class="relative flex items-start space-x-3">
    <div>
      <div class="relative">
        <div class="flex items-center justify-center w-8 h-8 bg-gray-100 rounded-full ring-8 ring-white">
          <svg class="w-5 h-5 text-blue-500" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true">
        <path fill-rule="evenodd" d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-6-3a2 2 0 11-4 0 2 2 0 014 0zm-2 4a5 5 0 00-4.546 2.916A5.986 5.986 0 0010 16a5.986 5.986 0 004.546-2.084A5 5 0 0010 11z" clip-rule="evenodd"></path>
          </svg>
        </div>
      </div>
    </div>

    <div class="min-w-0 flex-1 py-1.5">
      <div class="text-sm text-gray-500">
        <span class="font-medium text-gray-900">A private Strider</span>



      </div>
    </div>
      </div>
    </div>
  </div>
</div>

【讨论】:

输出仍然不包括页面源到迭代 find.all 以获取百分比数字所需的深度。我正在尝试使用:results = soup.find("div", class_="w-full mt-1 bg-white rounded-lg shadow") - 每个用户都包含一个百分比数字。但是对于页面源的初始获取请求,无论是使用 BS 还是 Selenium,都没有深入到 HTML 中。 如果您仍然遇到任何问题,请随时在Selenium room 讨论问题。 很遗憾我还没有足够的声望。无论如何,谢谢。【参考方案2】:

你正在寻找的元素/div 类 w-fullpy-1 实际上不在网站的源代码中,我只是试图检查并且找不到它们,当您在浏览器上打开该网站时能看到它们吗?

【讨论】:

是的。谢谢你的纠正。我也使用了citystrides.com/cities/26013,即使那样仍然无法访问请求中的这些元素。我不确定为什么请求没有拉出更深的嵌套部分。

以上是关于使用 requests.get 或 webdriver.get 时无法获取所有 HTML的主要内容,如果未能解决你的问题,请参考以下文章

Python接口测试-使用requests模块发送GET请求

使用 requests

Requests库基本使用

python接口自动化测试-requests.get()

python接口自动化测试-requests.get()

Requests库的基本使用R