随笔写一个简单的爬虫

Posted fuzzier

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了随笔写一个简单的爬虫相关的知识,希望对你有一定的参考价值。

目标:爬取damai网上即将上演的信息

 1 #!/usr/bin/python
 2 # -*- coding: utf-8 -*-
 3 
 4 import requests, re
 5 from bs4 import BeautifulSoup
 6 
 7 DOWNLOAD_URL = "http://www.damai.cn/bj/"
 8 
 9 #获取url页面内容
10 def download_page(url):
11     headers = {\'User-Agent\':\'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (Khtml, like Gecko) \'
12                         \'Chrome/51.0.2704.63 Safari/537.36\'}
13     data = requests.get(url, headers = headers).content
14     return data
15 #解析html 
16 def x_page(url):
17     soup = BeautifulSoup(url, \'html.parser\')
18     li_lst = soup.find(\'div\', class_=\'index-con\').next_sibling.next_sibling.find_all(\'li\')
19     titles = [i.find(\'dt\').find(\'a\').string for i in li_lst]
20     prices = map(lambda x: x.find(\'p\',class_=\'price\').find(\'strong\').text if x.find(\'p\',class_=\'price\').find(\'strong\') is not None else \'none\',li_lst)
21     time = [i.find(\'p\',class_=\'time\').string for i in li_lst]
22     places = [i.find(\'p\',class_=\'place\').string for i in li_lst]
23     return titles,prices,time,places
24 
25 if __name__ == \'__main__\':
26     url = download_page(DOWNLOAD_URL)
27     titles, prices, time, places = x_page(url)
28     info_lst = zip(titles,prices,time,places)
29     #写入文件
30     with open(\'damai.txt\',\'w+\') as f:
31         for j in info_lst:
32             f.write(\' \'.join(j)+\'\\r\\n\\r\\n\')

 

以上是关于随笔写一个简单的爬虫的主要内容,如果未能解决你的问题,请参考以下文章

com.panie 项目开发随笔_爬虫初识(2017.2.7)

Python爬虫随笔

Python爬虫学习-简单爬取网页数据

“最简单”的爬虫开发方法

Python练习册 第 0013 题: 用 Python 写一个爬图片的程序,爬 这个链接里的日本妹子图片 :-),(http://tieba.baidu.com/p/2166231880)(代码片段

写一个简单的爬虫