Python爬虫：urllib库的基本使用

Posted 2021-03-14 wbytts

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Python爬虫：urllib库的基本使用相关的知识，希望对你有一定的参考价值。

Python爬虫：urllib库的基本使用

Python爬虫

请求网址获取网页代码

import urllib.request 
url = "http://www.baidu.com" 
response = urllib.request.urlopen(url) 
data = response.read() 
# print(data) 
# 将文件获取的内容转换成字符串 
str_data = data.decode("utf-8") 
print(str_data) 
# 将结果保存到文件中 
with open("baidu.html", "w", encoding="utf-8") as f: 
    f.write(str_data) 

get带参数请求

import urllib.request 
 
def get_method_params(wd): 
    url = "http://www.baidu.com/s?wd=" 
    # 拼接字符串 
    final_url = url + wd 
    # 发送网络请求 
    response = urllib.request.urlopen(final_url) 
    print(response.read().decode("utf-8")) 
 
get_method_params("美女") 

直接这么写会报错：
技术图片

原因是，网址里面包含了汉字，但是ascii码是没有汉字的，需要转义一下：

import urllib.request 
import urllib.parse 
import string 
 
def get_method_params(wd): 
    url = "http://www.baidu.com/s?wd=" 
    # 拼接字符串 
    final_url = url + wd 
    # 将包含汉字的网址进行转义 
    encode_new_url = urllib.parse.quote(final_url, safe=string.printable) 
    # 发送网络请求 
    response = urllib.request.urlopen(encode_new_url) 
    print(response.read().decode("utf-8")) 
 
get_method_params("美女") 

使用字典拼接参数

import urllib.request 
import urllib.parse 
import string 
 
def get_params(): 
    url = "http://www.baidu.com/s?w" 
 
    params = { 
        "wd": "美女", 
        "key": "zhang", 
        "value": "san" 
    } 
 
    str_params = urllib.parse.urlencode(params) 
    print(str_params) 
 
    final_url = url + str_params 
    # 将带有中文的url转义 
    encode_url = urllib.parse.quote(final_url, safe=string.printable) 
 
    response = urllib.request.urlopen(encode_url) 
    data = response.read().decode("utf-8") 
    print(data) 
 
get_params() 

以上是关于Python爬虫：urllib库的基本使用的主要内容，如果未能解决你的问题，请参考以下文章