python 使用xpath获取HTML和过滤器

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 使用xpath获取HTML和过滤器相关的知识,希望对你有一定的参考价值。

#!/usr/bin/python
# -*- coding: utf-8 -*-
from lxml import html
import requests
import re

# CEX HTML REQUEST

gameList = []
gameList.append(["Deadlight: Director's Cut", 					"12", "5035228121522"])
gameList.append(["Devil May Cry Definitive Edition",			"12", "5055060930755"])
gameList.append(["Just Cause 3", 								"18", "5021290069770"])
gameList.append(["République", 									"20", "813633016542"])
gameList.append(["Until Dawn", 									"20", "711719874836"])
gameList.append(["Dying Light", 								"28", "5051892165280"])
gameList.append(["Uncharted 4", 								"28", "0711719454410"])
gameList.append(["Little Nightmares + Figure", 					"30", "3391891992473"])
gameList.append(["WipeOut Omega Collection", 					"30", "711719854463"])
gameList.append(["Rise of Tomb Raider", 						"32", "5021290074767"])
gameList.append(["Yakuza 0", 									"35", "5055277027996"])
gameList.append(["Hitman", 										"35", "5021290075863"])
gameList.append(["Nioh", 										"38", "711719819066"])
gameList.append(["NieR: Automata", 								"45", "5021290074484"])
gameList.append(["Bioshock: The Collection", 					"48", "5026555421898"])

newPricesHTML = ""
newPricesTXT = ""

for i in range(len(gameList)):

	page = requests.get("https://pt.webuy.com/product-detail?id=" + gameList[i][2])
	source = html.fromstring(page.content)

	priceFullText = source.xpath('//td[@id="Asellprice"]/text()')
	priceFullString = str(priceFullText)
	priceStart = priceFullString.find("20ac") + 4
	#print page.content
	newPrice = int(priceFullString[priceStart : priceStart + 2])

	if newPrice < int(gameList[i][1]):

		newPricesHTML += gameList[i][0] + ": "
		newPricesHTML += str(newPrice) + ": "
		newPricesHTML += "https://pt.webuy.com/product.php?sku=" + gameList[i][2]
		newPricesHTML += "<br>"

		newPricesTXT += gameList[i][0] + ": "
		newPricesTXT += str(newPrice) + ": "
		newPricesTXT += "https://pt.webuy.com/product.php?sku=" + gameList[i][2]
		newPricesTXT += "\n"

		print(gameList[i][0] + " : YES! from " + gameList[i][1] + " to " + str(newPrice))
	else:
		print(gameList[i][0] + " : NOPE!")

if len(newPricesHTML) != 0 :
	text_file = open("/Users/diogoqueiros/Downloads/CEX-Prices.txt", "w")
	text_file.write("CEX-PricesDown:\n%s" % newPricesTXT)
	text_file.close()

以上是关于python 使用xpath获取HTML和过滤器的主要内容,如果未能解决你的问题,请参考以下文章

Python爬虫——使用XPath和lxml库解析HTML

Python怎样获取XPath下的A标签的内容

Python 爬虫开发之xpath使用

使用 XPath 进行 Python XML 过滤 [重复]

python xpath 获取指定页面中指定区域的html代码

python网络数据采集之xpath