python---网络爬虫
Posted 帅到要去报警
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python---网络爬虫相关的知识,希望对你有一定的参考价值。
写了一个简单的网络爬虫:
#coding=utf-8 from bs4 import BeautifulSoup import requests url = "http://www.weather.com.cn/textFC/hb.shtml" def get_temperature(url): headers = { ‘User-Agent‘:‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36‘, ‘Upgrade-Insecure-Requests‘:‘1‘, ‘Referer‘:‘http://www.weather.com.cn/weather1d/10129160502A.shtml‘, ‘Host‘:‘www.weather.com.cn‘ } res = requests.get(url,headers=headers) res.encoding = "utf-8" content = res.content # 拿到的是ascll编码 content = content.decode(‘UTF-8‘)# 转成UTF-8编码 #print(content) soup = BeautifulSoup(content,‘lxml‘) conMidetab = soup.find(‘div‘,class_=‘conMidtab‘) conMidetab2_list = conMidetab.find_all(‘div‘,class_=‘conMidtab2‘) for x in conMidetab2_list: tr_list = x.find_all(‘tr‘)[2:] # 所有的tr province = ‘‘ min = 0 for index,x in enumerate(tr_list): if index == 0: td_list = x.find_all(‘td‘) province = td_list[0].text.replace(‘ ‘,‘‘) city = td_list[1].text.replace(‘ ‘,‘‘) min = td_list[7].text.replace(‘ ‘,‘‘) else: td_list = x.find_all(‘td‘) city = td_list[0].text.replace(‘ ‘,‘‘) min = td_list[6].text.replace(‘ ‘,‘‘) print(province,city,min) # province_list = tr_list[2] # td_list = province_list.find_all(‘td‘) # province_td = td_list[0] # province = province_td.text # #print(province.replace(‘ ‘,‘‘)) get_temperature(url)
以上是关于python---网络爬虫的主要内容,如果未能解决你的问题,请参考以下文章