python爬虫10.17

Posted 2020-10-11

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python爬虫10.17相关的知识，希望对你有一定的参考价值。

import urllib
import chardet

def encode_detect():
    html = urllib.urlopen(url)
    content = html.read()
    reslut = chardet.detect(content)
    encoding = reslut[‘encoding‘]
    return encoding
url = ‘http://www.iplaypython.com‘
print encode_detect()

import urllib2
import random

url = ‘http://blog.csdn.net/liaodehong‘
my_headers = [
    ‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36‘
    ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063‘
]
def get_content(url,headers):
    random_header = random.choice(headers)
    req = urllib2.Request(url)
    req.add_header(‘User-Agent‘,random_header)
    req.add_header(‘Host‘,‘blog.csdn.net‘)
    req.add_header(‘Referer‘,‘http://blog.csdn.net/‘)
    req.add_header(‘GET‘,url)

    content = urllib2.urlopen(req).read()
    return content
print get_content(url,my_headers)

以上是关于python爬虫10.17的主要内容，如果未能解决你的问题，请参考以下文章