网络爬虫技术——淘宝数据采集实例

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了网络爬虫技术——淘宝数据采集实例相关的知识，希望对你有一定的参考价值。

1、实验内容：通过网络爬虫技术，按照用户需求采集任意商品信息，如查找python相关书的信息，然后批量下载存储，为后期的数据清洗、数据分析、数据可视化奠定基础。

2、源代码

#-*-coding:utf-8-*-

'''''

Created on 2018年1月10日

@author: lin

'''

import requests

from bs4 import BeautifulSoup

import bs4

import re

def gethtmlText(url):

try:

r = requests.get(url,timeout=30)

r.raise_for_status

r.encoding = r.apparent_encoding

return r.text

except:

return ""

def parserPage(goodsList,html):

tlt = re.findall(r'\"view_price\"\:\"[\d\.]*\"',html)

plt = re.findall(r'\"raw_title\"\:\".*?\"',html) #添加问号使用最小匹配的

for i in range(len(tlt)):

title = eval(tlt[i].split(':')[1]) #eval()函数十分强大，可以将将字符串str当成有效的表达式来求值并返回计算结果

price = eval(plt[i].split(':')[1])

goodsList.append([title,price])

def printPage(goodsList):

tplt="{:6}\t{:8}\t{:16}"

print(tplt.format("序号","价格","商品名称"))

for i in range(len(goodsList)):

goods = goodsList[i]

print(tplt.format(i+1,goods[0],goods[1]))

def main():

#goods = "书包"

goods=input("请输入商品名称：")

depth = 2;

url = "https://s.taobao.com/search?q="

goodsList = []

for i in range(depth):

html = getHTMLText(url+goods+"&s="+str(i*44))

parserPage(goodsList, html)

printPage(goodsList)

main()

3、运行结果

以上是关于网络爬虫技术——淘宝数据采集实例的主要内容，如果未能解决你的问题，请参考以下文章