python爬虫 selenium 抓取 今日头条(ajax异步加载)
Posted hellangels333
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python爬虫 selenium 抓取 今日头条(ajax异步加载)相关的知识,希望对你有一定的参考价值。
from selenium import webdriver from lxml import etree from pyquery import PyQuery as pq import time driver = webdriver.Chrome() driver.maximize_window() driver.get(‘https://www.toutiao.com/‘) driver.implicitly_wait(10) driver.find_element_by_link_text(‘科技‘).click() driver.implicitly_wait(10) for x in range(3): js="var q=document.documentElement.scrollTop="+str(x*500) driver.execute_script(js) time.sleep(2) time.sleep(5) page = driver.page_source doc = pq(page) doc = etree.HTML(str(doc)) contents = doc.xpath(‘//div[@class="wcommonFeed"]/ul/li‘) print(contents) for x in contents: title = x.xpath(‘div/div[1]/div/div[1]/a/text()‘) if title: title = title[0] with open(‘toutiao.txt‘,‘a+‘,encoding=‘utf8‘)as f: f.write(title+‘\n‘) print(title) else: pass
以上是关于python爬虫 selenium 抓取 今日头条(ajax异步加载)的主要内容,如果未能解决你的问题,请参考以下文章
Python3网络爬虫开发实战 分析Ajax爬取今日头条街拍美图