python web1（解析url）

Posted 2021-01-15 junkdog

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python web1（解析url）相关的知识，希望对你有一定的参考价值。

环境：pycharm

尝试对地址进行切片去掉头 http 或 https

a.遇到了一些问题

url = ‘https://www.cnblogs.com/derezzed/articles/8119592.html‘
    #检查协议
protocl = "http"
if url[:7] =="http://":
    u = url.split(‘://‘)[1]
    elif url[:8] == "https://":
    protocl = "https"
    u = url.split("://")
   else:
    u = url
    print(u)

技术分享图片

发现无任何输出

url = ‘https://www.cnblogs.com/derezzed/articles/8119592.html‘
    #检查协议
protocl = "http"
if url[:7] =="http://":
    u = url.split(‘://‘)[1]
    print(u)
elif url[:8] == "https://":
    protocl = "https"
    u = url.split("://")
    print(u)
else:
    u = url
    print(u)

技术分享图片

修改后看到了结果至于为何暂不知道原因

b.按着教程边理解边写出的解析url程序 (此程序有问题）

#url = ‘http://movie.douban.com/top250‘
#解析url 返回一个tuple 包含 protocol host path port

def parsed_url(url):
    #检查协议

    protocol = ‘http‘
    if url[:7] == ‘http://‘:
        a = url.split(‘://‘)[1]

    elif url[:8] == ‘https://‘:
        a = url.split(‘https://‘)[1]
        protocol = ‘https‘


    #检查默认path
    i = a.find(‘/‘)
    if(i == -1):
        path = ‘/‘
        host = a
    else:
        host = a[:16]
        path = a[6:]

    #检查端口
    port_dict = {
    ‘http‘: 80,
    ‘https‘ : 443,
}
    #默认端口
    port = port_dict[protocol]
    if ‘:‘ in host:
        h = host.split(‘:‘)
        host = h[0]
        port = int (h[1])

    return  protocol, host, port, path