鎵嬫妸鎵嬫暀浣犱娇鐢≒ython缃戠粶鐖櫕鑾峰彇鑿滆氨淇℃伅

Posted 椋炴€昏亰IT

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了鎵嬫妸鎵嬫暀浣犱娇鐢≒ython缃戠粶鐖櫕鑾峰彇鑿滆氨淇℃伅相关的知识,希望对你有一定的参考价值。

鐐瑰嚮鈥?/span>椋炴€昏亰IT鈥濓紝閫夋嫨鈥滄槦鏍囸煍濃€?/span>

鍚庡彴鍥炲鈥?/span>666鈥濋鍙栬祫鏂欎竴浠?/span>


/1 鍓嶈█/

    鍦ㄦ斁鍋囨椂 锛岀粡甯告兂灏濊瘯涓€涓嬭嚜宸卞仛楗紝涓嬪帹鎴胯繖涓綉鍧€鏄釜涓嶉敊鐨勯€夋嫨銆?/span>

    涓嬪帹鎴挎槸蹇呴€夌殑缃戝潃涔嬩竴锛屼富瑕佹彁渚涘悇绉嶇編椋熷仛娉曚互鍙婄児楗妧宸с€傚寘鍚绫诲緢澶氥€?/span>

    浠婂ぉ鏁欏ぇ瀹跺幓鐖彇涓嬪帹鎴跨殑鑿滆氨 锛屼繚瀛樺湪world鏂囨。锛屾柟渚挎棩鍚庡埗浣滆嚜宸辩殑灏忚彍璋便€?/span>


/2 椤圭洰鐩爣/

    鑾峰彇鑿滆氨锛屽苟鎵归噺鎶婅彍 鍚嶃€?鍘?鏂?銆佷笅 杞?閾?鎺?銆佷笅杞戒繚瀛樺湪world鏂囨。銆?/span>


/3 椤圭洰鍑嗗/

杞欢锛?span>PyCharm

闇€瑕佺殑搴擄細requests銆?span>lxml銆?span>fake_useragent銆?/strong>time

缃戠珯濡備笅锛?/span>

https://www.xiachufang.com/explore/?page={}

鐐瑰嚮涓嬩竴椤垫椂锛屾瘡澧炲姞涓€椤祊age鑷鍔?锛岀敤{}浠f浛鍙樻崲鐨勫彉閲忥紝鍐嶇敤for寰幆閬嶅巻杩欑綉鍧€锛屽疄鐜板涓綉鍧€璇锋眰銆?/span>


/4 鍙嶇埇鎺柦鐨勫鐞?

涓昏鏈変袱涓偣闇€瑕佹敞鎰忥細

1銆佺洿鎺ヤ娇鐢╮equests搴擄紝鍦ㄤ笉璁剧疆浠讳綍header鐨勬儏鍐典笅锛岀綉绔欑洿鎺ヤ笉杩斿洖鏁版嵁

2銆佸悓涓€涓猧p杩炵画璁块棶澶氭锛岀洿鎺ュ皝鎺塱p锛岃捣鍒濇垜鐨刬p灏辨槸杩欐牱琚皝鎺夌殑銆?/span>

涓轰簡瑙e喅杩欎袱涓棶棰橈紝鏈€鍚庣粡杩囩爺绌讹紝浣跨敤浠ヤ笅鏂规硶锛屽彲浠ユ湁鏁堣В鍐炽€?/span>

1锛夎幏鍙栨甯哥殑 http璇锋眰澶达紝骞跺湪requests璇锋眰鏃惰缃繖浜涘父瑙勭殑http璇锋眰澶淬€?/span>

2锛変娇鐢?fake_useragent 锛屼骇鐢熼殢鏈虹殑UserAgent杩涜璁块棶銆?/span>


/5 椤圭洰瀹炵幇/

1銆佸畾涔変竴涓猚lass绫荤户鎵縪bject锛屽畾涔塱nit鏂规硶缁ф壙self锛屼富鍑芥暟main缁ф壙self銆傚鍏ラ渶瑕佺殑搴撳拰缃戝潃锛屼唬鐮佸涓嬫墍绀恒€?/span>

import requestsfrom lxml import etreefrom fake_useragent import UserAgentimport time
class kitchen(object): def __init__(self): self.url = "https://www.xiachufang.com/explore/?page={}"
def main(self): pass
if __name__ == '__main__': imageSpider = kitchen() imageSpider.main()


2銆侀殢鏈轰骇鐢烾serAgent銆?/span>

 for i in range(1, 50): self.headers = {                'User-Agent': ua.random, }


3銆佸彂閫佽姹? 鑾峰彇鍝嶅簲, 椤甸潰鍥炶皟锛屾柟渚夸笅娆¤姹傘€?/span>

def get_page(self, url):    res = requests.get(url=url, headers=self.headers)    html = res.content.decode("utf-8")    return html


4銆亁path瑙f瀽涓€绾ч〉闈㈡暟鎹?鑾峰彇浜岀骇椤甸潰缃戝潃銆?/span>

 def parse_page(self, html):    parse_html = etree.HTML(html)    image_src_list = parse_html.xpath('//li/div/a/@href')


5銆乫or閬嶅巻锛屽畾涔変竴涓彉閲廸ood_info淇濆瓨锛岃幏鍙栧埌浜岀骇椤甸潰瀵瑰簲鐨勮彍 鍚嶃€?鍘?鏂?銆佷笅 杞?閾?鎺ャ€?/span>

for i in image_src_list:       url = "https://www.xiachufang.com/" + i       # print(url)       html1 = self.get_page(url)  # 绗簩涓彂鐢熻姹?/span>       parse_html1 = etree.HTML(html1)       # print(parse_html1)       num = parse_html1.xpath('.//h2[@id="steps"]/text()')[0].strip()       name = parse_html1.xpath('.//li[@class="container"]/p/text()')       ingredients = parse_html1.xpath('.//td//a/text()')       food_info = ''' 绗?%s 绉?/span> 鑿?鍚?: %s鍘?鏂?: %s涓?杞?閾?鎺?: %s,================================================================= ''' % (str(self.u), num, ingredients, url)


6銆佷繚瀛樺湪world鏂囨。 銆?/span>

 f = open('涓嬪帹鎴?鑿滆氨.doc', 'a', encoding='utf-8') # 浠?w'鏂瑰紡鎵撳紑鏂囦欢 f.write(str(food_info)) f.close()


7銆佽皟鐢ㄦ柟娉曪紝瀹炵幇鍔熻兘銆?/span>

html = self.get_page(url)self.parse_page(html)

8銆侀」鐩紭鍖?/span>

1锛夋柟娉曚竴锛氳缃椂闂村欢鏃躲€?/span>

 time.sleep(1.4)


2锛夋柟娉曚簩锛氬畾涔変竴涓彉閲弖,for閬嶅巻锛岃〃绀虹埇鍙栫殑鏄鍑犵椋熺墿銆傦紙鏇存竻鏅板彲瑙傦級銆?/span>

u = 0self.u += 1;


/6 鏁堟灉灞曠ず/

1銆佺偣鍑荤豢鑹插皬涓夎杩愯杈撳叆璧峰椤碉紝缁堟椤点€?/span>

鎵嬫妸鎵嬫暀浣犱娇鐢≒ython缃戠粶鐖櫕鑾峰彇鑿滆氨淇℃伅


2銆佽繍琛岀▼搴忓悗锛岀粨鏋滄樉绀哄湪鎺у埗鍙帮紝濡備笅鍥炬墍绀恒€?/span>

鎵嬫妸鎵嬫暀浣犱娇鐢≒ython缃戠粶鐖櫕鑾峰彇鑿滆氨淇℃伅


3銆佸皢杩愯缁撴灉淇濆瓨鍦╳orld鏂囨。涓紝濡備笅鍥炬墍绀恒€?/span>

鎵嬫妸鎵嬫暀浣犱娇鐢≒ython缃戠粶鐖櫕鑾峰彇鑿滆氨淇℃伅


4銆佸弻鍑绘枃浠讹紝鍐呭濡備笅鍥炬墍绀恒€?/span>

鎵嬫妸鎵嬫暀浣犱娇鐢≒ython缃戠粶鐖櫕鑾峰彇鑿滆氨淇℃伅


/7 灏忕粨/

1銆佹湰鏂囩珷鍩轰簬Python缃戠粶鐖櫕锛岃幏鍙栦笅鍘ㄦ埧缃戠珯鑿滆氨淇℃伅锛?/span>鍦ㄥ簲鐢ㄤ腑鍑虹幇鐨勯毦鐐瑰拰閲嶇偣锛屼互鍙婂浣曢槻姝㈠弽鐖紝鍋氬嚭浜嗙浉瀵逛簬鐨勮В鍐虫柟妗堛€?/span>

2銆佷粙缁嶄簡濡備綍鍘绘嫾鎺ュ瓧绗︿覆锛屼互鍙婂垪琛ㄥ浣曡繘琛岀被鍨嬬殑杞崲銆?/span>

3銆佷唬鐮佸緢绠€鍗曪紝甯屾湜鑳藉甯埌浣犮€?/span>

4銆佹杩庡ぇ瀹剁Н鏋佸皾璇曪紝鏈夋椂鍊欑湅鍒板埆浜哄疄鐜拌捣鏉ュ緢绠€鍗曪紝浣嗘槸鍒拌嚜宸卞姩鎵嬪疄鐜扮殑鏃跺€欙紝鎬讳細鏈夊悇绉嶅悇鏍风殑闂锛屽垏鍕跨溂楂樻墜浣庯紝鍕ゅ姩鎵嬶紝鎵嶅彲浠ョ悊瑙g殑鏇村姞娣卞埢銆?/span>

5銆佸彲浠ラ€夋嫨鑷繁鍠滄鐨勫垎绫伙紝鑾峰彇鑷繁鍠滄鐨勮彍璋憋紝姣忎釜浜洪兘鏄帹甯堛€?/span>




椋炴€昏亰IT姣忓ぉ缁欏ぇ瀹舵彁渚涗簰鑱旂綉鐨勫共璐с€傝瘑鍒笅闈簩缁寸爜鍏虫敞锛岀矇涓濆彲浠ュ彂閫佲€?/span>666鈥濆埌鍚庡彴棰嗗彇涓€浠藉涔犺祫鏂欍€?/span>


以上是关于鎵嬫妸鎵嬫暀浣犱娇鐢≒ython缃戠粶鐖櫕鑾峰彇鑿滆氨淇℃伅的主要内容,如果未能解决你的问题,请参考以下文章

鎵嬫妸鎵嬫暀浣犲湪Windows鐜涓嬪崌绾

android寮€鍙戠瑪璁颁箣缃戠粶缂栫▼鈥斾娇鐢═CP鍗忚鍜孶RL杩涜缃戠粶缂栫▼

绉嬪悗涓€棰楃硸锛屾暀浣犱娇鐢╯ekiro jsrpc

[娣卞叆娴呭嚭LB]鎵嬫妸鎵嬪甫浣犲疄鐜颁竴涓礋杞藉潎琛″櫒

鎵挎帴cardboard澶栧寘锛寀nity3d澶栧寘锛堝寳浜姩杞€?璋锋瓕CARDBOARD鐪熷己澶э級

Dubbo婧愮爜璁茶В瑙嗛鏁欑▼