xpath案例 爬取58出租房源信息&解析下载图片数据&乱码问题

Posted jnhnsnow

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了xpath案例 爬取58出租房源信息&解析下载图片数据&乱码问题相关的知识,希望对你有一定的参考价值。

58二手房解析房源名称

from lxml import etree
import requests
url = https://haikou.58.com/chuzu/j2/
headers = {
User-Agent: Mozilla/5.0 (Linux; android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/77.0.3865.90 Mobile Safari/537.36
}
parser = etree.HTMLParser(encoding=utf-8)
page_text = requests.get(url=url).text
tree = etree.HTML(page_text,parser=parser)
lis = tree.xpath(//ul[@class="house-list"]/li)
for li_item in lis:
    res=li_item.xpath(.//h2/a/text()) #注意 ./  
    print(res[0].strip())

 


 

爬取彼岸图网图片

 

from lxml import etree
import requests
url = http://pic.netbian.com/4kfengjing
headers = {
User-Agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Mobile Safari/537.36
}
parser = etree.HTMLParser(encoding=utf-8)
page_text = requests.get(url=url,headers=headers).text
tree = etree.HTML(page_text,parser=parser)
res = tree.xpath(//div[@class="slist"]//li/a/img/@src)
count=0
for url_item in res:
    full_url = "%s%s"%(http://pic.netbian.com/,url_item)
    res = requests.get(url=full_url).content
    with open(图片%s.jpg%count,wb)as f:
        f.write(res)
    count+=1

乱码问题:

  1.整体

    - response = requests.get(url=xxx,headers=xxx)

    -response.encoding = ‘utf-8‘

  2. 单独

      - xxx.encode(‘iso-8859-1‘).decode(‘gbk‘)    (通用处理中文乱码方案)

 

以上是关于xpath案例 爬取58出租房源信息&解析下载图片数据&乱码问题的主要内容,如果未能解决你的问题,请参考以下文章

爬虫系列3:Requests+Xpath 爬取租房网站信息并保存本地

对于房天下租房信息进行爬取

利用python爬取贝壳网租房信息

xpath案例-58二手房

python爬虫:找房助手V1.0-爬取58同城租房信息

xpath应用(一)58同城二手房网页房屋信息爬取