python 将所有html选择标签项目刮到网站上

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 将所有html选择标签项目刮到网站上相关的知识,希望对你有一定的参考价值。

import csv
import requests 
from bs4 import BeautifulSoup


def parse_text(stags,soup):
	# initialize list
	lfound = list() 

	# search tags into html code
	g_data = soup.find_all(stags)

	# process found goals
	for item in g_data:
	  try:
	  	lfound.append(item.text)
	  except Exception,e:
	  	print "WARNING: %s"%str(e)
	  	pass

	return lfound


def parse_att(stags,sattribute,soup):
	# initialize list
	lfound = list() 

	# search tags into html code
	g_data = soup.find_all(stags)

	# process found goals
	for item in g_data:
	  try:
	  	lfound.append(item.get(sattribute))
	  except Exception,e:
	  	print "WARNING: %s"%str(e)
	  	pass

	return lfound




if __name__ == "__main__":

	for i in range(1):
	  # url to parse
		url="http://www.nytimes.com/"
		print url
		# build Beautiful object
		r = requests.get(url)
		soup = BeautifulSoup(r.content, "lxml")


		## parse text of "p" html tag
		print parse_text("p",soup)

		## parse "src" attribute of "img" html tag
		print parse_att("img","src",soup)

以上是关于python 将所有html选择标签项目刮到网站上的主要内容,如果未能解决你的问题,请参考以下文章

在 Python 中将 Wikipedia 表刮到 CSV

Python爬虫项目,获取所有网站上的新闻,并保存到数据库中,解析html网页等

在scrapy中将项目刮到mysql

数据可视化设计:标签云wordcloud

python爬虫时,bs4无法读取网页标签中的文本

python爬虫beautifulsoup4系列3