python 将所有html选择标签项目刮到网站上

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 将所有html选择标签项目刮到网站上相关的知识,希望对你有一定的参考价值。

import csv
import requests 
from bs4 import BeautifulSoup


def parse_text(stags,soup):
	# initialize list
	lfound = list() 

	# search tags into html code
	g_data = soup.find_all(stags)

	# process found goals
	for item in g_data:
	  try:
	  	lfound.append(item.text)
	  except Exception,e:
	  	print "WARNING: %s"%str(e)
	  	pass

	return lfound


def parse_att(stags,sattribute,soup):
	# initialize list
	lfound = list() 

	# search tags into html code
	g_data = soup.find_all(stags)

	# process found goals
	for item in g_data:
	  try:
	  	lfound.append(item.get(sattribute))
	  except Exception,e:
	  	print "WARNING: %s"%str(e)
	  	pass

	return lfound




if __name__ == "__main__":

	for i in range(1):
	  # url to parse
		url="http://www.nytimes.com/"
		print url
		# build Beautiful object
		r = requests.get(url)
		soup = BeautifulSoup(r.content, "lxml")


		## parse text of "p" html tag
		print parse_text("p",soup)

		## parse "src" attribute of "img" html tag
		print parse_att("img","src",soup)

以上是关于python 将所有html选择标签项目刮到网站上的主要内容,如果未能解决你的问题,请参考以下文章