复制粘贴不适用于 python selenium 中的无头浏览器

Posted 2023-02-23

技术标签:

【中文标题】复制粘贴不适用于 python selenium 中的无头浏览器【英文标题】：Copy Paste not working with headless browser in python selenium 【发布时间】：2021-12-25 21:09:29 【问题描述】：

我正在使用 selenium 和 python 来单击网页上的按钮。这会将 csv 格式的数据复制到剪贴板上。然后我使用剪贴板上的数据创建一个数组，在程序中进一步使用。一切正常，直到我以无头模式启动 webdriver。有什么解决办法吗？整个代码可以在没有硒的情况下编写吗？我对代码中的想法和改进持开放态度。

    try:
        objFFOptions = Options()
        objFFOptions.add_argument('--headless')
        objFFWebDriver = webdriver.Firefox(options= objFFOptions ) # start hidden
        #objFFWebDriver = webdriver.Firefox()
    except:
        print("Error in initiating the Firefox webdriver")
        objFFWebDriver.quit()
        quit()


    try:
        objFFWebDriver.get("https://chartink.com/screener/90dis")
    except:
        print("Error in opening the webpage")
        objFFWebDriver.quit()
        quit()

    # loop for waiting before query data loads
    intAttemptCounter = 0
    boolStockDataFetched = False

    while True:
        intAttemptCounter = intAttemptCounter + 1

        print("\tFetching attempt ", intAttemptCounter)
        try:
            objFilterMessageElement = WebDriverWait(objFFWebDriver, (intDelaySeconds * intAttemptCounter)). \
                until(expected_conditions.presence_of_element_located((By.ID, 'DataTables_Table_0_info')) or \
                      expected_conditions.presence_of_element_located((By.CLASS_NAME, 'dataTables_empty')))

            print("\tEither of the two marker elements found")

            if re.search(r"Filtered\s+[0-9]+\s+stocks\s+\([0-9]+\s+to\s+[0-9]+\)",
                         objFilterMessageElement.text) is not None:
                print("\t",objFilterMessageElement)

                try:
                    # click copy button
                    objFFWebDriver.find_element(By.XPATH, \
                                                "//*[@class='btn btn-default buttons-copy buttons-html5 btn-primary']").click()
                except NoSuchElementException:
                    if intAttemptCounter <= intMaxAttempt:
                        continue

                # store the query result from clipboard to a string
                strCSVData = pyperclip.paste()
                pyperclip.copy("")

                # create array from the csv string containing stock data
                arrDataList = list(csv.reader(StringIO(strCSVData),delimiter='\t'))
                arrFinalDataList = [arrDataRecord[2] for arrDataRecord in arrDataList[3:]]
                
                boolStockDataFetched = True
                break
            elif objFilterMessageElement.text == "No stocks filtered in the Scan":
                print("\t",objFilterMessageElement.text)
                break
            else:
                if intAttemptCounter <= intMaxAttempt:
                    continue

        except TimeoutException:
            print("\tTimeout Exception")
            if intAttemptCounter <= intMaxAttempt:
                continue
            else:
                break

    if boolStockDataFetched == False:
        print("Error in fetching records or no records fetched")
        
    objFFWebDriver.quit()

【问题讨论】：

【参考方案1】：

您可能无法在无头浏览器中复制粘贴。相反，您可以从可视表中读取数据。

但是，无论如何，您根本不需要 Selenium，如果您使用浏览器的检查器查看页面发出的请求，您可以制定执行类似顺序的内容，如下所示：

import re
from pprint import pprint
import requests

sess = requests.Session()
sess.headers["User-Agent"] = "Mozilla/5.0 Safari/537.36"

# Do initial GET request, grab CSRF token
resp = sess.get("https://chartink.com/screener/90dis")
resp.raise_for_status()
csrf_token_m = re.search(r'<meta name="csrf-token" content="(.+?)" />', resp.text)
csrf_token = csrf_token_m.group(1)

# Do data query
resp = sess.post(
    "https://chartink.com/screener/process",
    data=
        "scan_clause": "( cash ( latest count( 90, 1 where latest ha-low > latest ichimoku cloud top( 9 , 26 , 52 ) ) = 90 ) )",
    ,
    headers=
        "Referer": "https://chartink.com/screener/90dis",
        "x-csrf-token": csrf_token,
        "x-requested-with": "XMLHttpRequest",
    ,
)
resp.raise_for_status()
data = resp.json()
pprint(data)

这会打印出来，例如

'data': ['bsecode': None,
           'close': 18389.5,
           'name': 'NIFTY100',
           'nsecode': 'NIFTY100',
           'per_chg': 1.28,
           'sr': 1,
           'volume': 0,
          'bsecode': '532978',
           'close': 18273.8,
           'name': 'Bajaj Finserv Limited',
           'nsecode': 'BAJAJFINSV',
           'per_chg': 2.25,
           'sr': 2,
           'volume': 207802,
          ...

【讨论】：

非常感谢@AKX。我将不得不进行大量研究才能完全理解您提供的解决方案。就像我不知道什么是 csrf 令牌？ :) 我尝试使用 Web Developer Tool 研究网页代码，但我无法弄清楚数据表是如何显示的。是 atlas.js 中的 javascript 代码吗？我无法弄清楚在哪里寻找网页发出的请求。它是在 Web Developer Tool 的“Inspector Tab”中的某个地方吗？好的，我想我知道在哪里查找此页面发出的请求。 Web 开发者工具 -> 控制台 -> GET https://chartink.com/screener/90dis。但是你是怎么找出“sess.post(...”代码的。你是怎么找到查询字符串(“scan_clause”)的？取决于您的浏览器。在我的 Chrome 中，它位于 Payload 子选项卡中。您还可以右键单击请求并“复制为 Fetch”以获取可以转换为 Python 的 Javascript 调用。非常感谢@AKX。我找到了在 Firefox 中查找发布请求的位置。我尝试使用 torpy 实现上述代码，但出现错误。我创建了 TorRequests 对象objTorRequest = TorRequests()，然后获取了会话对象objTorSession = objTorRequest.get_session()，然后我调用了get 函数objTorResponse = objTorSession.get("https://chartink.com/")。但我得到一个错误“AttributeError：'_GeneratorContextManager'对象没有属性'get'”。知道是什么原因造成的吗？

以上是关于复制粘贴不适用于 python selenium 中的无头浏览器的主要内容，如果未能解决你的问题，请参考以下文章