05_python

Posted renjian666

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了05_python相关的知识,希望对你有一定的参考价值。

利用selenium去爬取京东商品信息

from selenium import webdriver
from selenium.webdriver.common.keys import  Keys
import time


# 打开京东搜索手机商品
brower = webdriver.Chrome()
try:
    brower.implicitly_wait(10)

    brower.get("http://www.jd.com")

    input_search = brower.find_element_by_id("key")

    input_search.send_keys("手机")

    input_search.send_keys(Keys.ENTER)

    time.sleep(3)

    good_list = brower.find_elements_by_class_name("gl-item")

    num = int(brower.find_element_by_css_selector(".p-skip em b").text)
    print(num)

    i = 0
    while i < num:
        for good in good_list:
            # 获取到京东商品的URL
            good_url = good.find_element_by_css_selector(".p-img a").get_attribute("href")
            print(good_url)
            # 获取到京东商品的名称
            good_name = good.find_element_by_css_selector("a em").text
            print(good_name)

            # 获取到京东商品的价格
            good_price = good.find_element_by_css_selector(".p-price strong").text
            print(good_price)

            # 获取到京东商品的评价数
            good_commit = good.find_element_by_css_selector(".p-commit strong").text
            print(good_commit)

            # 将商品信息写入到文件中
            good_info = f‘‘‘
            商品名称:good_name
            商品链接:good_url
            商品价格:good_price
            商品评价: good_commit
            \n
            ‘‘‘
            with open("jd_手机商品信息.txt", "a", encoding="utf-8") as f:
                f.write(good_info)

        # 点击京东下一页
        next_page = brower.find_element_by_class_name("pn-next").click()

        time.sleep(3)

        # 再次获取到京东页面商品
        good_list = brower.find_elements_by_class_name("gl-item")

        # 自增
        i = i+1


except Exception as e:
    print(e)
finally:
    time.sleep(10)
    brower.close()

 

以上是关于05_python的主要内容,如果未能解决你的问题,请参考以下文章

pythonpython魔法方法(待填坑)

pythonpython 面向对象编程笔记

Pythonpython-内置常量

Pythonpython对象与json相互转换

草根学PythonPython 的 Magic Method

pythonpython读写文件,都不乱码