import pyarrow not working <- error is "ValueError: The pyarrow library is not installed, pl

Posted 2023-03-11

技术标签:

【中文标题】import pyarrow not working <- error is "ValueError: The pyarrow library is not installed, please install pyarrow to use the to_arrow() function."【英文标题】： 【发布时间】：2021-03-24 07:39:14 【问题描述】：

我尝试在终端和 juypter lab 中安装它，它说它已成功安装，但是当我运行 df = query_job.to_dataframe() 时，我不断收到错误“ ValueError: The pyarrow library is not installed, please install pyarrow to use the to_arrow() function."。我不知道如何解决这个问题。有什么建议吗？我正在尝试使用代码最终从谷歌数据工作室访问数据，

from google.cloud import bigquery
import pandas
import numpy
import pyarrow
bigquery_client = bigquery.Client()
import os 
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='full file path here.json'
QUERY = """
SELECT * 
FROM `warehouse`
LIMIT 100
"""
query_job = bigquery_client.query(QUERY)
df = query_job.to_dataframe()

【问题讨论】：

您好，可以分享一下您的 requirements.txt 吗？您是否尝试将所有软件包更新到最新版本？我也有这个问题。 【参考方案1】：

在测试您的 Python 代码时，我收到了相同的错误消息 ModuleNotFoundError: No module named 'pyarrow'。在使用 pip install pyarrow 安装 pyarrow 依赖项后，此行为消失了。

编辑：运行pip install pyarrow 后重新启动内核后，它对我有用

【讨论】：

【参考方案2】：

我有同样的问题。在以下之后修复：

pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'

来源：https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas

【讨论】：

不。没有为我修好。【参考方案3】：

我有同样的问题，因为我有 pyarrow 2.0，但是你需要 1.0.1 版本。尝试运行这一行： pip install pandas-gbq==0.14.0

【讨论】：

【参考方案4】：

为了避免使用 fetch_pandas_all()，我使用了 fetchall，然后将结果转换为 pandas DataFrame 我用过：

requirements.txt

snowflake-connector-python==2.4.3
pandas==1.2.4

dag.py

    def execute(self, **kwags):
        """
        :param kwargs: optional parameter. Can be used to provide task input context
        :return: returns query result in json format
        """

        ctx = snowflake.connector.connect(
            user=self.SNOWFLAKE_USER,
            password=self.SNOWFLAKE_PASSWORD,
            account=self.SNOWFLAKE_ACCOUNT
        )
        cs = ctx.cursor()
        try:
            cs.execute(self.sql_query)
            data = cs.fetchall()
            df = pd.DataFrame(data)
            print(f'\nQUERY RESULT: \n' \
                      f' tabulate(df, headers="keys", tablefmt="psql", showindex="always") \n')
        finally:
            cs.close()
        ctx.close()
        logging.info("Query executed successfully")
        return json.loads(data)

【讨论】：

【参考方案5】：

我遇到过类似的问题，但后来我使用了 pandas Dataframe 方法：

client = bigquery.Client()
try:
    df = client.query(query)
    df = pd.Dataframe(df)
except ValueError:
    print("google services not available or invalid credentials.")

df.head()

【讨论】：

【参考方案6】：

只需要使用 pip 安装 pyarrow

df = client.query(query1).to_dataframe()
 data = df.to_json()
        
 print(data['total_transactions'][0])
 print(data['total_visits'][0])

【讨论】：

此答案代码仅包含帖子作者代码的副本。它没有显示解决方案。请添加底层 pip 命令

以上是关于import pyarrow not working <- error is "ValueError: The pyarrow library is not installed, pl的主要内容，如果未能解决你的问题，请参考以下文章