鎵嬫妸鎵嬫暀浣犱娇鐢≒ython缃戠粶鐖櫕鑾峰彇鑿滆氨淇℃伅
Posted 椋炴€昏亰IT
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了鎵嬫妸鎵嬫暀浣犱娇鐢≒ython缃戠粶鐖櫕鑾峰彇鑿滆氨淇℃伅相关的知识,希望对你有一定的参考价值。
鐐瑰嚮鈥?/span>椋炴€昏亰IT鈥濓紝閫夋嫨鈥滄槦鏍囸煍濃€?/span>
鍚庡彴鍥炲鈥?/span>666鈥濋鍙栬祫鏂欎竴浠?/span>
鍦ㄦ斁鍋囨椂 锛岀粡甯告兂灏濊瘯涓€涓嬭嚜宸卞仛楗紝涓嬪帹鎴胯繖涓綉鍧€鏄釜涓嶉敊鐨勯€夋嫨銆?/span>
涓嬪帹鎴挎槸蹇呴€夌殑缃戝潃涔嬩竴锛屼富瑕佹彁渚涘悇绉嶇編椋熷仛娉曚互鍙婄児楗妧宸с€傚寘鍚绫诲緢澶氥€?/span>
浠婂ぉ鏁欏ぇ瀹跺幓鐖彇涓嬪帹鎴跨殑鑿滆氨 锛屼繚瀛樺湪world鏂囨。锛屾柟渚挎棩鍚庡埗浣滆嚜宸辩殑灏忚彍璋便€?/span>
/2 椤圭洰鐩爣/
鑾峰彇鑿滆氨锛屽苟鎵归噺鎶婅彍 鍚嶃€?鍘?鏂?銆佷笅 杞?閾?鎺?銆佷笅杞戒繚瀛樺湪world鏂囨。銆?/span>
/3 椤圭洰鍑嗗/
杞欢锛?span>PyCharm
闇€瑕佺殑搴擄細requests銆?span>lxml銆?span>fake_useragent銆?/strong>time
缃戠珯濡備笅锛?/span>
https://www.xiachufang.com/explore/?page={}
鐐瑰嚮涓嬩竴椤垫椂锛屾瘡澧炲姞涓€椤祊age鑷鍔?锛岀敤{}浠f浛鍙樻崲鐨勫彉閲忥紝鍐嶇敤for寰幆閬嶅巻杩欑綉鍧€锛屽疄鐜板涓綉鍧€璇锋眰銆?/span>
/4 鍙嶇埇鎺柦鐨勫鐞?
涓昏鏈変袱涓偣闇€瑕佹敞鎰忥細
1銆佺洿鎺ヤ娇鐢╮equests搴擄紝鍦ㄤ笉璁剧疆浠讳綍header鐨勬儏鍐典笅锛岀綉绔欑洿鎺ヤ笉杩斿洖鏁版嵁
2銆佸悓涓€涓猧p杩炵画璁块棶澶氭锛岀洿鎺ュ皝鎺塱p锛岃捣鍒濇垜鐨刬p灏辨槸杩欐牱琚皝鎺夌殑銆?/span>
涓轰簡瑙e喅杩欎袱涓棶棰橈紝鏈€鍚庣粡杩囩爺绌讹紝浣跨敤浠ヤ笅鏂规硶锛屽彲浠ユ湁鏁堣В鍐炽€?/span>
1锛夎幏鍙栨甯哥殑 http璇锋眰澶达紝骞跺湪requests璇锋眰鏃惰缃繖浜涘父瑙勭殑http璇锋眰澶淬€?/span>
2锛変娇鐢?fake_useragent 锛屼骇鐢熼殢鏈虹殑UserAgent杩涜璁块棶銆?/span>
/5 椤圭洰瀹炵幇/
1銆佸畾涔変竴涓猚lass绫荤户鎵縪bject锛屽畾涔塱nit鏂规硶缁ф壙self锛屼富鍑芥暟main缁ф壙self銆傚鍏ラ渶瑕佺殑搴撳拰缃戝潃锛屼唬鐮佸涓嬫墍绀恒€?/span>
import requests
from lxml import etree
from fake_useragent import UserAgent
import time
class kitchen(object):
def __init__(self):
self.url = "https://www.xiachufang.com/explore/?page={}"
def main(self):
pass
if __name__ == '__main__':
imageSpider = kitchen()
imageSpider.main()
2銆侀殢鏈轰骇鐢烾serAgent銆?/span>
for i in range(1, 50):
self.headers = {
'User-Agent': ua.random,
}
3銆佸彂閫佽姹? 鑾峰彇鍝嶅簲, 椤甸潰鍥炶皟锛屾柟渚夸笅娆¤姹傘€?/span>
def get_page(self, url):
res = requests.get(url=url, headers=self.headers)
html = res.content.decode("utf-8")
return html
4銆亁path瑙f瀽涓€绾ч〉闈㈡暟鎹?鑾峰彇浜岀骇椤甸潰缃戝潃銆?/span>
def parse_page(self, html):
parse_html = etree.HTML(html)
image_src_list = parse_html.xpath('//li/div/a/@href')
5銆乫or閬嶅巻锛屽畾涔変竴涓彉閲廸ood_info淇濆瓨锛岃幏鍙栧埌浜岀骇椤甸潰瀵瑰簲鐨勮彍 鍚嶃€?鍘?鏂?銆佷笅 杞?閾?鎺ャ€?/span>
for i in image_src_list:
url = "https://www.xiachufang.com/" + i
# print(url)
html1 = self.get_page(url) # 绗簩涓彂鐢熻姹?/span>
parse_html1 = etree.HTML(html1)
# print(parse_html1)
num = parse_html1.xpath('.//h2[@id="steps"]/text()')[0].strip()
name = parse_html1.xpath('.//li[@class="container"]/p/text()')
ingredients = parse_html1.xpath('.//td//a/text()')
food_info = '''
绗?%s 绉?/span>
鑿?鍚?: %s
鍘?鏂?: %s
涓?杞?閾?鎺?: %s,
=================================================================
''' % (str(self.u), num, ingredients, url)
6銆佷繚瀛樺湪world鏂囨。 銆?/span>
f = open('涓嬪帹鎴?鑿滆氨.doc', 'a', encoding='utf-8') # 浠?w'鏂瑰紡鎵撳紑鏂囦欢
f.write(str(food_info))
f.close()
7銆佽皟鐢ㄦ柟娉曪紝瀹炵幇鍔熻兘銆?/span>
html = self.get_page(url)
self.parse_page(html)
8銆侀」鐩紭鍖?/span>
1锛夋柟娉曚竴锛氳缃椂闂村欢鏃躲€?/span>
time.sleep(1.4)
2锛夋柟娉曚簩锛氬畾涔変竴涓彉閲弖,for閬嶅巻锛岃〃绀虹埇鍙栫殑鏄鍑犵椋熺墿銆傦紙鏇存竻鏅板彲瑙傦級銆?/span>
u = 0
self.u += 1;
/6 鏁堟灉灞曠ず/
1銆佺偣鍑荤豢鑹插皬涓夎杩愯杈撳叆璧峰椤碉紝缁堟椤点€?/span>
2銆佽繍琛岀▼搴忓悗锛岀粨鏋滄樉绀哄湪鎺у埗鍙帮紝濡備笅鍥炬墍绀恒€?/span>
3銆佸皢杩愯缁撴灉淇濆瓨鍦╳orld鏂囨。涓紝濡備笅鍥炬墍绀恒€?/span>
4銆佸弻鍑绘枃浠讹紝鍐呭濡備笅鍥炬墍绀恒€?/span>
/7 灏忕粨/
1銆佹湰鏂囩珷鍩轰簬Python缃戠粶鐖櫕锛岃幏鍙栦笅鍘ㄦ埧缃戠珯鑿滆氨淇℃伅锛?/span>鍦ㄥ簲鐢ㄤ腑鍑虹幇鐨勯毦鐐瑰拰閲嶇偣锛屼互鍙婂浣曢槻姝㈠弽鐖紝鍋氬嚭浜嗙浉瀵逛簬鐨勮В鍐虫柟妗堛€?/span>
2銆佷粙缁嶄簡濡備綍鍘绘嫾鎺ュ瓧绗︿覆锛屼互鍙婂垪琛ㄥ浣曡繘琛岀被鍨嬬殑杞崲銆?/span>
3銆佷唬鐮佸緢绠€鍗曪紝甯屾湜鑳藉甯埌浣犮€?/span>
4銆佹杩庡ぇ瀹剁Н鏋佸皾璇曪紝鏈夋椂鍊欑湅鍒板埆浜哄疄鐜拌捣鏉ュ緢绠€鍗曪紝浣嗘槸鍒拌嚜宸卞姩鎵嬪疄鐜扮殑鏃跺€欙紝鎬讳細鏈夊悇绉嶅悇鏍风殑闂锛屽垏鍕跨溂楂樻墜浣庯紝鍕ゅ姩鎵嬶紝鎵嶅彲浠ョ悊瑙g殑鏇村姞娣卞埢銆?/span>
5銆佸彲浠ラ€夋嫨鑷繁鍠滄鐨勫垎绫伙紝鑾峰彇鑷繁鍠滄鐨勮彍璋憋紝姣忎釜浜洪兘鏄帹甯堛€?/span>
椋炴€昏亰IT姣忓ぉ缁欏ぇ瀹舵彁渚涗簰鑱旂綉鐨勫共璐с€傝瘑鍒笅闈簩缁寸爜鍏虫敞锛岀矇涓濆彲浠ュ彂閫佲€?/span>666鈥濆埌鍚庡彴棰嗗彇涓€浠藉涔犺祫鏂欍€?/span>
以上是关于鎵嬫妸鎵嬫暀浣犱娇鐢≒ython缃戠粶鐖櫕鑾峰彇鑿滆氨淇℃伅的主要内容,如果未能解决你的问题,请参考以下文章
android寮€鍙戠瑪璁颁箣缃戠粶缂栫▼鈥斾娇鐢═CP鍗忚鍜孶RL杩涜缃戠粶缂栫▼
[娣卞叆娴呭嚭LB]鎵嬫妸鎵嬪甫浣犲疄鐜颁竴涓礋杞藉潎琛″櫒