Python 3.7- PhantomJS - Driver.get(url)'窗口句柄/名称无效或已关闭?'
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python 3.7- PhantomJS - Driver.get(url)'窗口句柄/名称无效或已关闭?'相关的知识,希望对你有一定的参考价值。
使用两个函数来刮取网站会导致driver.get错误。
我已尝试使用while和for循环的不同变体来实现此功能。现在我得到一个driver.get错误。初始函数独立工作,但是当一个接一个地运行这两个函数时,我得到了这个错误。
import requests, sys, webbrowser, bs4, time
import urllib.request
import pandas as pd
from selenium import webdriver
driver = webdriver.PhantomJS(executable_path = 'C:\PhantomJS\bin\phantomjs.exe')
jobtit = 'some+job'
location = 'some+city'
urlpag = ('https://www.indeed.com/jobs?q=' + jobtit + '&l=' + location + '%2C+CA')
def initial_scrape():
data = []
try:
driver.get(urlpag)
results = driver.find_elements_by_tag_name('h2')
print('Finding the results for the first page of the search.')
for result in results: # loop 2
job_name = result.text
link = result.find_element_by_tag_name('a')
job_link = link.get_attribute('href')
data.append({'Job' : job_name, 'link' : job_link})
print('Appending the first page results to the data table.')
if result == len(results):
return
except Exception:
print('An error has occurred when trying to run this script. Please see the attached error message and screenshot.')
driver.save_screenshot('screenshot.png')
driver.close()
return data
def second_scrape():
data = []
try:
#driver.get(urlpag)
pages = driver.find_element_by_class_name('pagination')
print('Variable nxt_pg is ' + str(nxt_pg))
for page in pages:
page_ = page.find_element_by_tag_name('a')
page_link = page_.get_attribute('href')
print('Taking a look at the different page links..')
for page_link in range(1,pg_amount,1):
driver.click(page_link)
items = driver.find_elements_by_tag_name('h2')
print('Going through each new page and getting the jobs for ya...')
for item in items:
job_name = item.text
link = item.find_element_by_tag_name('a')
job_link = link.get_attribute('href')
data.append({'Job' : job_name, 'link' : job_link})
print('Appending the jobs to the data table....')
if page_link == pg_amount:
print('Oh boy! pg_link == pg_amount...time to exit the loops')
return
except Exception:
print('An error has occurred when trying to run this script. Please see the attached error message and screenshot.')
driver.save_screenshot('screenshot.png')
driver.close()
return data
预期:
初始功能
- 从urlpag获取网站
- 按标签名称查找元素,并在追加到列表时循环遍历元素。
- 完成后,所有元素将退出并返回列表。
第二功能
- 在仍然在urlpag上时,按类名查找元素并获取下一页要删除的链接。
- 由于我们要抓取每个页面,请遍历每个页面进行抓取并将元素附加到不同的表格中。
- 一旦我们达到我们的pg_amount限制 - 退出并返回最终列表。
实际:
初始功能
- 从urlpag获取网站
- 按标签名称查找元素,并在追加到列表时循环遍历元素。
- 完成后,所有元素将退出并返回列表。
第二功能
- 查找类分页,打印nxt_variable然后抛出下面的错误。
Traceback (most recent call last):
File "C:UsersUserAppDataLocalProgramsPythonPython37-32ScriptsIndeedscraperindeedscrape.py", line 23, in initial_scrape
driver.get(urlpag)
File "C:UsersUserAppDataLocalProgramsPythonPython37-32libsite-packagesseleniumwebdriver
emotewebdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "C:UsersUserAppDataLocalProgramsPythonPython37-32libsite-packagesseleniumwebdriver
emotewebdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:UsersUserAppDataLocalProgramsPythonPython37-32libsite-packagesseleniumwebdriver
emoteerrorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchWindowException: Message: {"errorMessage":"Currently Window handle/name is invalid (closed?)"
答案
对于有这个错误的人,我最终切换到chromedriver并使用它来代替webscraping。看来使用PhantomJS驱动程序有时会返回此错误。
以上是关于Python 3.7- PhantomJS - Driver.get(url)'窗口句柄/名称无效或已关闭?'的主要内容,如果未能解决你的问题,请参考以下文章
Anaconda 下 Python 3.7 和 3.8 的切换
python selenium +phantomjs 怎么样伪装头部