requests.exceptions.MissingSchema:无效的 URL 'None':尝试通过 Selenium 和 Python 查找断开的链接时未提供架构

Posted

技术标签:

【中文标题】requests.exceptions.MissingSchema:无效的 URL \'None\':尝试通过 Selenium 和 Python 查找断开的链接时未提供架构【英文标题】:requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied while trying to find broken links through Selenium and Pythonrequests.exceptions.MissingSchema:无效的 URL 'None':尝试通过 Selenium 和 Python 查找断开的链接时未提供架构 【发布时间】:2019-06-16 23:05:30 【问题描述】:

我想使用 Selenium + Python 在我的网页上找到损坏的链接。我尝试了上面的代码,但它显示了以下错误:

requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

代码试验:

for link in links:

    r = requests.head(link.get_attribute('href'))
    print(link.get_attribute('href'), r.status_code)

完整代码:

def test_lsearch(self):
    driver=self.driver
    driver.get("http://www.google.com")
    driver.set_page_load_timeout(10)
    driver.find_element_by_name("q").send_keys("selenium")

    driver.set_page_load_timeout(10)
    el=driver.find_element_by_name("btnK")
    el.click()
    time.sleep(5)

    links=driver.find_elements_by_css_selector("a")
    for link in links:
        r=requests.head(link.get_attribute('href'))
        print(link.get_attribute('href'),r.status_code)

【问题讨论】:

如何获得links 列表?显示您的完整代码。 没有人会从图像中编写代码来重现您的问题,请将您的代码添加为问题的一部分 【参考方案1】:

此错误消息...

    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

...暗示对 unicode 域名和路径的支持在收集的 href 属性中失败。

这个错误在models.py中定义如下:

    # Support for unicode domain names and paths.
    scheme, auth, host, port, path, query, fragment = parse_url(url)
    if not scheme:
        raise MissingSchema("Invalid URL 0!r: No schema supplied. "
                            "Perhaps you meant http://0?".format(url))

解决方案

当Google Home Page Search Box 上的关键字selenium 的搜索结果可用时,您可能正在尝试查找损坏的链接。为此,您可以使用以下解决方案:

代码块:

import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys 

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('https://google.co.in/')
search = driver.find_element_by_name('q')
search.send_keys("selenium")
search.send_keys(Keys.RETURN)
links = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located((By.XPATH, "//div[@class='rc']//h3//ancestor::a[1]")))
print("Number of links : %s" %len(links))
for link in links:
    r = requests.head(link.get_attribute('href'))
    print(link.get_attribute('href'), r.status_code)

控制台输出:

Number of links : 9
https://www.seleniumhq.org/ 200
https://www.seleniumhq.org/download/ 200
https://www.seleniumhq.org/docs/01_introducing_selenium.jsp 200
https://www.guru99.com/selenium-tutorial.html 200
https://en.wikipedia.org/wiki/Selenium_(software) 200
https://github.com/SeleniumHQ 200
https://www.edureka.co/blog/what-is-selenium/ 200
https://seleniumhq.github.io/selenium/docs/api/py/ 200
https://seleniumhq.github.io/docs/ 200

更新

根据您的反驳问题,从 Selenium 的角度来规范地回答为什么 xpath 有效但 tagName 无效的原因会有点困难。也许您可能想更深入地研究这些讨论:

Bug 1323614 - Cannot authenticate: requests.exceptions.MissingSchema: Invalid URL 'stage/auth/token/obtain/': No schema supplied. Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

【讨论】:

当我通过 TAG_NAME 查找元素来使用您的代码时,它向我显示了相同的错误,但对于 XPATH 它有效。为什么会这样? @Talib 查看我的答案更新,如果有任何问题,请告诉我 你能帮我解决这个问题吗:***.com/questions/54347439/…【参考方案2】:

试试这个,我很确定有更好的方法来完成这个,这可能会也可能不会解决你的问题,在岸上的时候,我想出了这个方法,它似乎对我有用

import itertools
import requests
from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys

driver = Chrome()
driver.get('https://www.google.com/')

# Search 'selenium'
search = driver.find_element_by_css_selector('input[aria-label="Search"]')
search.send_keys('selenium')
search.send_keys(Keys.ENTER)

# Resuls div
container = driver.find_element_by_id('rso')
results = container.find_elements_by_css_selector('.bkWMgd')
del results[1]

# links
_links = []
for result in results:
    _links.append([r.get_attribute('href') for r in result.find_elements_by_css_selector('.r>a')])

driver.quit()
links = list(itertools.chain.from_iterable(_links))

for link in links:
    r = requests.get(link)
    print(link, r.status_code)

输出

https://www.seleniumhq.org/ 200
https://www.seleniumhq.org/projects/webdriver/ 200
https://www.webmd.com/a-to-z-guides/supplement-guide-selenium 200
https://www.healthline.com/nutrition/selenium-benefits 200
https://github.com/SeleniumHQ/selenium 200
https://en.wikipedia.org/wiki/Selenium_(software) 200
https://www.medicalnewstoday.com/articles/287842.php 200
https://ods.od.nih.gov/factsheets/Selenium-Consumer/ 200
https://selenium-python.readthedocs.io/ 200
https://selenium-python.readthedocs.io/installation.html 200

【讨论】:

以上是关于requests.exceptions.MissingSchema:无效的 URL 'None':尝试通过 Selenium 和 Python 查找断开的链接时未提供架构的主要内容,如果未能解决你的问题,请参考以下文章