python 爬取淘宝模特信息
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 爬取淘宝模特信息相关的知识,希望对你有一定的参考价值。
通过本篇博文,介绍一下我对指定信息进行爬取的时候的思路,顺便贴一下代码。
一、首先获取想要爬取的网站的url链接的规则变化
可以看出来该网站页面的url结构简单,变化的只是https://mm.taobao.com/json/request_top_list.htm?page= page的值
二、对网站页面的DOM树的结构进行分析,方便我们获取我们想要的内容信息,
我写了个简单的网页分析脚本analyze.py:用来输出DOM树,方便我后面做筛选.
# -*- coding:utf-8 -*- #模块导入 import requests from bs4 import BeautifulSoup #想要分析的网站页面 url = "http://mm.taobao.com/json/request_top_list.htm?page=1" response = requests.get(url) response.encoding = ‘gb2312‘ html = response.text #使用lxml解析器进行处理 soup = BeautifulSoup(html, ‘lxml‘) #把DOM树结构输出 print soup.prettify()
<html> <body> <div class="list-item"> <div class="personal-info"> <div class="pic-word"> <div class="pic s60"> <a class="lady-avatar" href="//mm.taobao.com/687471686.htm" target="_blank"> <img height="60" src="//gtd.alicdn.com/sns_logo/i2/TB1XZ1PQVXXXXaJXpXXSutbFXXX.jpg_60x60.jpg" width="60"/> </a> </div> <p class="top"> <a class="lady-name" href="//mm.taobao.com/self/model_card.htm?user_id=687471686" target="_blank"> 田媛媛 </a> <em> <strong> 27 </strong> 岁 </em> <span> 广州市 </span> <span class="friend-follow J_FriendFollow" data-custom="type=14&app_id=12052609" data-group="" data-userid="687471686"> 加关注 </span> </p> <p> <em> 平面模特 设计师 T台、展模特 </em> <em> <strong> 164433 </strong> 粉丝 </em> </p> </div> <div class="pic w610"> <a href="//mm.taobao.com/photo-687471686-10000854046.htm?pic_id=10003369435" target="_blank"> <img data-ks-lazyload="//img.alicdn.com/imgextra/i4/687471686/TB1TORaKFXXXXc0aXXXXXXXXXXX_!!2-tstar.png" src="//assets.alicdn.com/kissy/1.0.0/build/imglazyload/spaceball.gif"/> </a> </div> </div> <div class="list-info"> <div class="popularity"> <dl> <dt> 1 </dt> <dd> <span> 总积分: </span> 60742 </dd> </dl> </div> <ul class="info-detail"> <li> 新增积分: <strong> 529 </strong> </li> <li> 好评率: <strong> 90.0 </strong> % </li> <li> 导购照片: <strong> 888 </strong> 张 </li> <li> 签约数量: <strong> 406 </strong> 次 </li> </ul> <p class="description"> 你还在为上下衣物搭配而苦恼么..你还在为出门不知道穿什么而烦躁么 ..vvip女神教你一键(_)美美哒 ! 不需要过多的搭配.不需要为不协调而苦恼 ..我们为你选好让你出门美美哒!! </p> <div class="J_LikeIt" photo-favor-count="0" photo-id="687471686_10003369435"> <div class="mm-photolike"> <a class="mm-photolike-btn" data-count="0" data-targetid="687471686_10003369435" href="javascript:void(0)"> 喜欢 </a> <var class="mm-photolike-count radius-3"> 0 </var> </div> </div> </div> </div> <div class="list-item"> <div class="personal-info"> <div class="pic-word"> <div class="pic s60"> <a class="lady-avatar" href="//mm.taobao.com/405095521.htm" target="_blank"> <img height="60" src="//gtd.alicdn.com/sns_logo/i5/TB1nK_zJXXXXXaaXVXXSutbFXXX.jpg_60x60.jpg" width="60"/> </a> </div> <p class="top"> <a class="lady-name" href="//mm.taobao.com/self/model_card.htm?user_id=405095521" target="_blank"> v悦悦 </a> <em> <strong> 27 </strong> 岁 </em> <span> 杭州市 </span> <span class="friend-follow J_FriendFollow" data-custom="type=14&app_id=12052609" data-group="" data-userid="405095521"> 加关注 </span> </p> <p> <em> 平面模特 演员 </em> <em> <strong> 148672 </strong> 粉丝 </em> </p> </div> <div class="pic w610"> <a href="//mm.taobao.com/photo-405095521-300257059.htm?pic_id=10000627358" target="_blank"> <img data-ks-lazyload="//img.alicdn.com/imgextra/i2/405095521/TB1vD47FVXXXXcwXpXXXXXXXXXX_!!405095521-2-tstar.png" src="//assets.alicdn.com/kissy/1.0.0/build/imglazyload/spaceball.gif"/> </a> </div> </div> <div class="list-info"> <div class="popularity"> <dl> <dt> 2 </dt> <dd> <span> 总积分: </span> 59978 </dd> </dl> </div> <ul class="info-detail"> <li> 新增积分: <strong> 586 </strong> </li> <li> 好评率: <strong> 100.0 </strong> % </li> <li> 导购照片: <strong> 3034 </strong> 张 </li> <li> 签约数量: <strong> 144 </strong> 次 </li> </ul> <p class="description"> 冬季通勤加厚保暖中长款羊毛外套 </p> <div class="J_LikeIt" photo-favor-count="0" photo-id="405095521_10000627358"> <div class="mm-photolike"> <a class="mm-photolike-btn" data-count="0" data-targetid="405095521_10000627358" href="javascript:void(0)"> 喜欢 </a> <var class="mm-photolike-count radius-3"> 0 </var> </div> </div> </div> </div> <div class="list-item"> <div class="personal-info"> <div class="pic-word"> <div class="pic s60"> <a class="lady-avatar" href="//mm.taobao.com/631300490.htm" target="_blank"> <img height="60" src="//gtd.alicdn.com/sns_logo/i8/TB1JRHEQpXXXXXwXVXXSutbFXXX.jpg_60x60.jpg" width="60"/> </a> </div> <p class="top"> <a class="lady-name" href="//mm.taobao.com/self/model_card.htm?user_id=631300490" target="_blank"> 崔辰辰 </a> <em> <strong> 28 </strong> 岁 </em> <span> 杭州市 </span> <span class="friend-follow J_FriendFollow" data-custom="type=14&app_id=12052609" data-group="" data-userid="631300490"> 加关注 </span> </p> <p> <em> 平面模特 </em> <em> <strong> 148579 </strong> 粉丝 </em> </p> </div> <div class="pic w610"> <a href="//mm.taobao.com/photo-631300490-10000816213.htm?pic_id=10003560889" target="_blank"> <img data-ks-lazyload="//img.alicdn.com/imgextra/i1/631300490/TB1e6VNLXXXXXauXVXXXXXXXXXX_!!0-tstar.jpg" src="//assets.alicdn.com/kissy/1.0.0/build/imglazyload/spaceball.gif"/> </a> </div> </div> <div class="list-info"> <div class="popularity"> <dl> <dt> 3 </dt> <dd> <span> 总积分: </span> 58698 </dd> </dl> </div> <ul class="info-detail"> <li> 新增积分: <strong> 638 </strong> </li> <li> 好评率: <strong> 93.54 </strong> % </li> <li> 导购照片: <strong> 3026 </strong> 张 </li> <li> 签约数量: <strong> 157 </strong> 次 </li> </ul> <p class="description"> 辰子家★韩版夏季女士学生休闲百搭笑脸宽松显瘦刺绣短袖圆领T恤 </p> <div class="J_LikeIt" photo-favor-count="0" photo-id="631300490_10003560889"> <div class="mm-photolike"> <a class="mm-photolike-btn" data-count="0" data-targetid="631300490_10003560889" href="javascript:void(0)"> 喜欢 </a> <var class="mm-photolike-count radius-3"> 0 </var> </div> </div> </div> </div> <div class="list-item"> <div class="personal-info"> <div class="pic-word"> <div class="pic s60"> <a class="lady-avatar" href="//mm.taobao.com/414457129.htm" target="_blank"> <img height="60" src="//gtd.alicdn.com/sns_logo/i6/TB1uGkBKVXXXXXiXFXXSutbFXXX.jpg_60x60.jpg" width="60"/> </a> </div> <p class="top"> <a class="lady-name" href="//mm.taobao.com/self/model_card.htm?user_id=414457129" target="_blank"> 大猫儿 </a> <em> <strong> 31 </strong> 岁 </em> <span> 广州市 </span> <span class="friend-follow J_FriendFollow" data-custom="type=14&app_id=12052609" data-group="" data-userid="414457129"> 加关注 </span> </p> <p> <em> 平面模特 </em> <em> <strong> 149244 </strong> 粉丝 </em> </p> </div> <div class="pic w610"> <a href="//mm.taobao.com/photo-414457129-10000678249.htm?pic_id=10002454721" target="_blank"> <img data-ks-lazyload="//img.alicdn.com/imgextra/i3/414457129/TB1M5MbKpXXXXcCXXXXXXXXXXXX_!!414457129-0-tstar.jpg" src="//assets.alicdn.com/kissy/1.0.0/build/imglazyload/spaceball.gif"/> </a> </div> </div> <div class="list-info"> <div class="popularity"> <dl> <dt> 4 </dt> <dd> <span> 总积分: </span> 56055 </dd> </dl> </div> <ul class="info-detail"> <li> 新增积分: <strong> 281 </strong> </li> <li> 好评率: <strong> 100.0 </strong> % </li> <li> 导购照片: <strong> 976 </strong> 张 </li> <li> 签约数量: <strong> 160 </strong> 次 </li> </ul> <p class="description"> 双肩背包一直是搭配中的重头戏 大容量的实用 与帅气铆钉的结合 时尚感十足 </p> <div class="J_LikeIt" photo-favor-count="0" photo-id="414457129_10002454721"> <div class="mm-photolike"> <a class="mm-photolike-btn" data-count="0" data-targetid="414457129_10002454721" href="javascript:void(0)"> 喜欢 </a> <var class="mm-photolike-count radius-3"> 0 </var> </div> </div> </div> </div> <div class="list-item"> <div class="personal-info"> <div class="pic-word"> <div class="pic s60"> <a class="lady-avatar" href="//mm.taobao.com/141234233.htm" target="_blank"> <img height="60" src="//gtd.alicdn.com/sns_logo/i8/TB1TVQ_LXXXXXcpXXXXSutbFXXX.jpg_60x60.jpg" width="60"/> </a> </div> <p class="top"> <a class="lady-name" href="//mm.taobao.com/self/model_card.htm?user_id=141234233" target="_blank"> 金甜甜 </a> <em> <strong> 1 </strong> 岁 </em> <span> 杭州市 </span> <span class="friend-follow J_FriendFollow" data-custom="type=14&app_id=12052609" data-group="" data-userid="141234233"> 加关注 </span> </p> <p> <em> 平面模特 演员 歌手 </em> <em> <strong> 151693 </strong> 粉丝 </em> </p> </div> <div class="pic w610"> <a href="//mm.taobao.com/photo-141234233-10001069307.htm?pic_id=10007138677" target="_blank"> <img data-ks-lazyload="//img.alicdn.com/imgextra/i1/141234233/TB1eKWgNVXXXXbAXpXXXXXXXXXX_!!0-tstar.jpg" src="//assets.alicdn.com/kissy/1.0.0/build/imglazyload/spaceball.gif"/> </a> </div> </div> <div class="list-info"> <div class="popularity"> <dl> <dt> 5 </dt> <dd> <span> 总积分: </span> 55920 </dd> </dl> </div> <ul class="info-detail"> <li> 新增积分: <strong> 586 </strong> </li> <li> 好评率: <strong> 100.0 </strong> % </li> <li> 导购照片: <strong> 5 </strong> 张 </li> <li> 签约数量: <strong> 426 </strong> 次 </li> </ul> <p class="description"> 三金冠蔓延女装淘宝店 </p> <div class="J_LikeIt" photo-favor-count="0" photo-id="141234233_10007138677"> <div class="mm-photolike"> <a class="mm-photolike-btn" data-count="0" data-targetid="141234233_10007138677" href="javascript:void(0)"> 喜欢 </a> <var class="mm-photolike-count radius-3"> 0 </var> </div> </div> </div> </div> <div class="list-item"> <div class="personal-info"> <div class="pic-word"> <div class="pic s60"> <a class="lady-avatar" href="//mm.taobao.com/96614110.htm" target="_blank"> <img height="60" src="//gtd.alicdn.com/sns_logo/i1/TB1zh6tRVXXXXclaXXXSutbFXXX.jpg_60x60.jpg" width="60"/> </a> </div> <p class="top"> <a class="lady-name" href="//mm.taobao.com/self/model_card.htm?user_id=96614110" target="_blank"> 紫轩 </a> <em> <strong> 30 </strong> 岁 </em> <span> 杭州市 </span> <span class="friend-follow J_FriendFollow" data-custom="type=14&app_id=12052609" data-group="" data-userid="96614110"> 加关注 </span> </p> <p> <em> 平面模特 舞蹈者 </em> <em> <strong> 147832 </strong> 粉丝 </em> </p> </div> <div class="pic w610"> <a href="//mm.taobao.com/photo-96614110-10001051909.htm?pic_id=10007118313" target="_blank"> <img data-ks-lazyload="//img.alicdn.com/imgextra/i4/96614110/TB1mx1DNFXXXXcwXXXXXXXXXXXX_!!0-tstar.jpg" src="//assets.alicdn.com/kissy/1.0.0/build/imglazyload/spaceball.gif"/> </a> </div> </div> <div class="list-info"> <div class="popularity"> <dl> <dt> 6 </dt> <dd> <span> 总积分: </span> 55898 </dd> </dl> </div> <ul class="info-detail"> <li> 新增积分: <strong> 660 </strong> </li> <li> 好评率: <strong> 91.83 </strong> % </li> <li> 导购照片: <strong> 6545 </strong> 张 </li> <li> 签约数量: <strong> 472 </strong> 次 </li> </ul> <p class="description"> 海宁皮草羊剪绒大衣2016新款绣花女士中长款海宁皮草外套,2016 秋冬新品来袭 街头休闲 保暖长袖 羊毛大衣 秋冬新品来袭,不管你瘦或胖,上身都很棒,衣服厚实,灰常保暖! </p> <div class="J_LikeIt" photo-favor-count="0" photo-id="96614110_10007118313"> <div class="mm-photolike"> <a class="mm-photolike-btn" data-count="0" data-targetid="96614110_10007118313" href="javascript:void(0)"> 喜欢 </a> <var class="mm-photolike-count radius-3"> 0 </var> </div> </div> </div> </div> <div class="list-item"> <div class="personal-info"> <div class="pic-word"> <div class="pic s60"> <a class="lady-avatar" href="//mm.taobao.com/37448401.htm" target="_blank"> <img height="60" src="//gtd.alicdn.com/sns_logo/i8/TB1bU.UFFXXXXc0XpXXSutbFXXX.jpg_60x60.jpg" width="60"/> </a> </div> <p class="top"> <a class="lady-name" href="//mm.taobao.com/self/model_card.htm?user_id=37448401" target="_blank"> 谢婷婷 </a> <em> <strong> 28 </strong> 岁 </em> <span> 杭州市 </span> <span class="friend-follow J_FriendFollow" data-custom="type=14&app_id=12052609" data-group="" data-userid="37448401"> 加关注 </span> </p> <p> <em> 平面模特 设计师 </em> <em> <strong> 167310 </strong> 粉丝 </em> </p> </div> <div class="pic w610"> <a href="//mm.taobao.com/photo-37448401-10000635003.htm?pic_id=10002475139" target="_blank"> <img data-ks-lazyload="//img.alicdn.com/imgextra/i2/37448401/TB1WcMQKpXXXXX9XpXXXXXXXXXX_!!37448401-0-tstar.jpg" src="//assets.alicdn.com/kissy/1.0.0/build/imglazyload/spaceball.gif"/> </a> </div> </div> <div class="list-info"> <div class="popularity"> <dl> <dt> 7 </dt> <dd> <span> 总积分: </span> 55414 </dd> </dl> </div> <ul class="info-detail"> <li> 新增积分: <strong> 638 </strong> </li> <li> 好评率: <strong> 100.0 </strong> % </li> <li> 导购照片: <strong> 1291 </strong> 张 </li> <li> 签约数量: <strong> 222 </strong> 次 </li> </ul> <p class="description"> 本期最爱的毛呢大衣,简约到极致的版反而更耐看有型,celine家的原版型定制, </p> <div class="J_LikeIt" photo-favor-count="0" photo-id="37448401_10002475139"> <div class="mm-photolike"> <a class="mm-photolike-btn" data-count="0" data-targetid="37448401_10002475139" href="javascript:void(0)"> 喜欢 </a> <var class="mm-photolike-count radius-3"> 0 </var> </div> </div> </div> </div> <div class="list-item"> <div class="personal-info"> <div class="pic-word"> <div class="pic s60"> <a class="lady-avatar" href="//mm.taobao.com/74386764.htm" target="_blank"> <img height="60" src="//gtd.alicdn.com/sns_logo/i8/TB1HmB7GXXXXXbHXFXXSutbFXXX.jpg_60x60.jpg" width="60"/> </a> </div> <p class="top"> <a class="lady-name" href="//mm.taobao.com/self/model_card.htm?user_id=74386764" target="_blank"> 夏晨洁 </a> <em> <strong> 28 </strong> 岁 </em> <span> 杭州市 </span> <span class="friend-follow J_FriendFollow" data-custom="type=14&app_id=12052609" data-group="" data-userid="74386764"> 加关注 </span> </p> <p> <em> 平面模特 </em> <em> <strong> 268403 </strong> 粉丝 </em> </p> </div> <div class="pic w610"> <a href="//mm.taobao.com/photo-74386764-10000208330.htm?pic_id=10001062464" target="_blank"> <img data-ks-lazyload="//img.alicdn.com/imgextra/i1/74386764/TB1tM0cGFXXXXXNXVXXXXXXXXXX_!!74386764-0-tstar.jpg" src="//assets.alicdn.com/kissy/1.0.0/build/imglazyload/spaceball.gif"/> </a> </div> </div> <div class="list-info"> <div class="popularity"> <dl> <dt> 8 </dt> <dd> <span> 总积分: </span> 55193 </dd> </dl> </div> <ul class="info-detail"> <li> 新增积分: <strong> 618 </strong> </li> <li> 好评率: <strong> 50.0 </strong> % </li> <li> 导购照片: <strong> 2192 </strong> 张 </li> <li> 签约数量: <strong> 160 </strong> 次 </li> </ul> <p class="description"> 轻奢水貂皮草整貂皮毛领女士皮草 外套披肩2014秋冬新款 </p> <div class="J_LikeIt" photo-favor-count="0" photo-id="74386764_10001062464"> <div class="mm-photolike"> <a class="mm-photolike-btn" data-count="0" data-targetid="74386764_10001062464" href="javascript:void(0)"> 喜欢 </a> <var class="mm-photolike-count radius-3"> 0 </var> </div> </div> </div> </div> <div class="list-item"> <div class="personal-info"> <div class="pic-word"> <div class="pic s60"> <a class="lady-avatar" href="//mm.taobao.com/523216808.htm" target="_blank"> <img height="60" src="//gtd.alicdn.com/sns_logo/i7/TB1HNp6LpXXXXaHaXXXSutbFXXX.jpg_60x60.jpg" width="60"/> </a> </div> <p class="top"> <a class="lady-name" href="//mm.taobao.com/self/model_card.htm?user_id=523216808" target="_blank"> Cherry </a> <em> <strong> 29 </strong> 岁 </em> <span> 广州市 </span> <span class="friend-follow J_FriendFollow" data-custom="type=14&app_id=12052609" data-group="" data-userid="523216808"> 加关注 </span> </p> <p> <em> 平面模特 设计师 </em> <em> <strong> 172238 </strong> 粉丝 </em> </p> </div> <div class="pic w610"> <a href="//mm.taobao.com/photo-523216808-301597150.htm?pic_id=10000809558" target="_blank"> <img data-ks-lazyload="//img.alicdn.com/imgextra/i3/523216808/TB1OxhsGXXXXXaDXFXXXXXXXXXX_!!523216808-0-tstar.jpg" src="//assets.alicdn.com/kissy/1.0.0/build/imglazyload/spaceball.gif"/> </a> </div> </div> <div class="list-info"> <div class="popularity"> <dl> <dt> 9 </dt> <dd> <span> 总积分: </span> 54468 </dd> </dl> </div> <ul class="info-detail"> <li> 新增积分: <strong> 214 </strong> </li> <li> 好评率: <strong> 96.87 </strong> % </li> <li> 导购照片: <strong> 1197 </strong> 张 </li> <li> 签约数量: <strong> 266 </strong> 次 </li> </ul> <p class="description"> ATAR 2014秋冬新款女装加厚伦敦羊毛呢料大衣舒适保暖显瘦外套 </p> <div class="J_LikeIt" photo-favor-count="0" photo-id="523216808_10000809558"> <div class="mm-photolike"> <a class="mm-photolike-btn" data-count="0" data-targetid="523216808_10000809558" href="javascript:void(0)"> 喜欢 </a> <var class="mm-photolike-count radius-3"> 0 </var> </div> </div> </div> </div> <div class="list-item"> <div class="personal-info"> <div class="pic-word"> <div class="pic s60"> <a class="lady-avatar" href="//mm.taobao.com/46599595.htm" target="_blank"> <img height="60" src="//gtd.alicdn.com/sns_logo/i7/TB1QPszGXXXXXbSXVXXSutbFXXX.jpg_60x60.jpg" width="60"/> </a> </div> <p class="top"> <a class="lady-name" href="//mm.taobao.com/self/model_card.htm?user_id=46599595" target="_blank"> 雪倩nika </a> <em> <strong> 26 </strong> 岁 </em> <span> 杭州市 </span> <span class="friend-follow J_FriendFollow" data-custom="type=14&app_id=12052609" data-group="" data-userid="46599595"> 加关注 </span> </p> <p> <em> 平面模特 </em> <em> <strong> 320256 </strong> 粉丝 </em> </p> </div> <div class="pic w610"> <a href="//mm.taobao.com/photo-46599595-10000447489.htm?pic_id=10002043550" target="_blank"> <img data-ks-lazyload="//img.alicdn.com/imgextra/i4/46599595/TB1yja3JXXXXXaSXXXXXXXXXXXX_!!46599595-0-tstar.jpg" src="//assets.alicdn.com/kissy/1.0.0/build/imglazyload/spaceball.gif"/> </a> </div> </div> <div class="list-info"> <div class="popularity"> <dl> <dt> 10 </dt> <dd> <span> 总积分: </span> 54360 </dd> </dl> </div> <ul class="info-detail"> <li> 新增积分: <strong> 492 </strong> </li> <li> 好评率: <strong> 100.0 </strong> % </li> <li> 导购照片: <strong> 3183 </strong> 张 </li> <li> 签约数量: <strong> 43 </strong> 次 </li> </ul> <p class="description"> ~非常舒适的一款衬衫,精致的雪纺材质,垂坠感好,亲肤透气,非常飘逸,门襟单排扣,方便穿脱,方便穿脱,整体上身效果非常不错的呢!前面一个口袋设计,起到了更好的装饰效果 </p> <div class="J_LikeIt" photo-favor-count="0" photo-id="46599595_10002043550"> <div class="mm-photolike"> <a class="mm-photolike-btn" data-count="0" data-targetid="46599595_10002043550" href="javascript:void(0)"> 喜欢 </a> <var class="mm-photolike-count radius-3"> 0 </var> </div> </div> </div> </div> <input id="J_Totalpage" type="hidden" value="4316"/> </body> </html>
分析的时候我们其实可以只截取一个人的信息,因为输出太多。每个人的结构都是固定的,方便分析!
三、程序代码:
# -*- coding:utf-8 -*- import requests from bs4 import BeautifulSoup import sys import re reload(sys) sys.setdefaultencoding(‘utf-8‘) for num in range(1,4300): try: URL = ‘http://mm.taobao.com/json/request_top_list.htm?page=%d‘ % num #print "现在爬取的网站url是:" + URL response = requests.get(URL) response.encoding = ‘gb2312‘ text = response.text soup = BeautifulSoup(text, ‘lxml‘) for model in soup.select(".list-item"): try: model_id = model.find(‘span‘, {‘class‘: ‘friend-follow J_FriendFollow‘})[‘data-userid‘] json_url = "http://mm.taobao.com/self/info/model_info_show.htm?user_id=%d" % int(model_id) response_json = requests.get(json_url) response_json.encoding = ‘gb2312‘ text_response_json = response_json.text soup_json = BeautifulSoup(text_response_json, ‘lxml‘) print "***********************************" + model.find(‘a‘, {‘class‘: ‘lady-name‘}).string + "*********************************" print "模特的名字:" + model.find(‘a‘, {‘class‘: ‘lady-name‘}).string print "模特的年龄:"+ model.find(‘p‘, {‘class‘: ‘top‘}).em.strong.string print "生日:" + soup_json.find(‘li‘, {‘class‘: ‘mm-p-cell-left‘}).span.string blood = soup_json.find_all(‘li‘, {‘class‘: ‘mm-p-cell-right‘})[1].span.string if blood is None: blood = "无" print "血型:" + blood print "学校/专业:" + soup_json.find_all(‘li‘)[5].span.string print "身高:" + soup_json.find(‘li‘, {‘class‘: ‘mm-p-small-cell mm-p-height‘}).p.string print "体重:" + soup_json.find(‘li‘, {‘class‘: ‘mm-p-small-cell mm-p-weight‘}).p.string print "三围:" + soup_json.find(‘li‘, {‘class‘: ‘mm-p-small-cell mm-p-size‘}).p.string print "罩杯:" + soup_json.find(‘li‘, {‘class‘: ‘mm-p-small-cell mm-p-bar‘}).p.string print "鞋码:" + soup_json.find(‘li‘, {‘class‘: ‘mm-p-small-cell mm-p-shose‘}).p.string print "模特所在地:"+ model.find(‘p‘, {‘class‘: ‘top‘}).span.string print "模特的id:"+ model.find(‘span‘, {‘class‘: ‘friend-follow J_FriendFollow‘})[‘data-userid‘] print "模特的标签:"+ model.find_all(‘p‘)[1].em.string print "模特的粉丝数:"+ model.find_all(‘p‘)[1].strong.string print "模特的排名:"+ [text for text in model.find(‘div‘, {‘class‘: ‘popularity‘}).dl.dt.stripped_strings][0] print model.find(‘ul‘, {‘class‘: ‘info-detail‘}).get_text(" ",strip=True) print "模特的个人资料页面:" +"http:"+ model.find(‘a‘, {‘class‘: ‘lady-name‘})[‘href‘] print "模特的个人作品页面:" +"http:"+ model.find(‘a‘, {‘class‘: ‘lady-avatar‘})[‘href‘] print "模特的个人头像:" + "http:" + model.find(‘img‘)[‘src‘] print "***********************************" + model.find(‘a‘, {‘class‘: ‘lady-name‘}).string + "*********************************" print "\n" except: print "error" except: print num + "page is error"
四、程序执行输出部分图:
总结:写的这篇博客整个程序的开发的思路的整个梳理。用到requests和beautiful这俩库。
希望对想做爬虫开发的人有点帮助。在我看来 思路很重要!
本文出自 “付炜超” 博客,请务必保留此出处http://9399369.blog.51cto.com/9389369/1953581
以上是关于python 爬取淘宝模特信息的主要内容,如果未能解决你的问题,请参考以下文章
python网络爬虫学习利用Pyspider+Phantomjs爬取淘宝模特图片