爬虫学习项目

Posted lpapython

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了爬虫学习项目相关的知识,希望对你有一定的参考价值。

项目1:

基于搜狗微信公众号的关键字搜索

技术图片
from selenium import webdriver
import os
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

DRIVER_PATH = os.path.join(os.path.dirname(__file__), chromedriver.exe)
opt = webdriver.ChromeOptions()
opt.headless = True
opt.add_argument(User-Agent=Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (Khtml, like Gecko)
                  Chrome/63.0.3239.132 Safari/537.36)
webphjs = webdriver.Chrome(executable_path=DRIVER_PATH, options=opt)
webphjs.get(https://weixin.sogou.com/)
_input = webphjs.find_element_by_id(query)
search = input(请输入公众号关键字:)
_input.send_keys(search)
webphjs.find_element_by_class_name(swz2).click()
while 1:
    soup = BeautifulSoup(webphjs.page_source, lxml)
    try:
        new_list = soup.find(ul, class: news-list2).find_all(li)
        for i in new_list:
            title = i.find(p, class: tit).find(a).text
            wxid = i.find(p, class: info).find(label).text
            jt = 
            for item in i.find_all(dl):
                key = item.find(dt)
                [s.extract() for s in key(script)]
                key = key.text.replace(\n, ‘‘)
                value = item.find(a) or item.find(dd)
                value = value.text.replace(\n, ‘‘)
                jt[key] = value
            print(-*20)
            print(公众号名称:.format(title))
            print(微信号:.format(wxid))
            for k, v in jt.items():
                print( .format(k, v))
        locator = (By.ID, sogou_next)
        try:
            ele = WebDriverWait(webphjs, 3).until(EC.presence_of_element_located(locator))
            webphjs.find_element_by_id(sogou_next).click()
        except:
            break
    except:
        print(查询不到有关于此关键字的内容)
        break
webphjs.close()
1.000000

设置Chrome为无头浏览器,可能会出现访问异常,被网页发现是自动测试软件,所以给浏览器加了一个User-Agent进行伪装。

 

以上是关于爬虫学习项目的主要内容,如果未能解决你的问题,请参考以下文章

爬虫学习项目

Python Scrapy爬虫框架学习0

爬虫学习(十四)——xpath项目实践

2019年Python爬虫学习必看

爬虫Scrapy指令学习

python爬虫项目讲解(scrapy-re)