在 jupyter 中运行 spark.sql 查询

Posted 2023-03-27

技术标签:

【中文标题】在 jupyter 中运行 spark.sql 查询【英文标题】：Running spark.sql query in jupyter 【发布时间】：2021-09-16 18:50:41 【问题描述】：

我正在启动 jupyter 笔记本

pyspark --driver-class-path /home/statspy/postgresql-42.2.23.jar --jars /home/statspy/postgresql-42.2.23.jar

我在 jupyter 中运行这个：

import os
jardrv = '/home/statspy/postgresql-42.2.23.jar'
from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.driver.extraClassPath', jardrv).getOrCreate()
url = 'jdbc:postgresql://127.0.0.1/dbname'
properties = 'user':'postgres', 'password':'secret'
df = spark.read.jdbc(url=url, table='tbname', properties=properties)

然后我可以运行：

df.printSchema()

我得到了架构。

但是我想运行这样的查询：

spark.sql("""select * from tbname""")

我收到一条错误消息table or view tbname not found

我需要更改哪些内容才能使用 spark.sql 而不是使用 df 运行查询？

【问题讨论】：

【参考方案1】：

在使用spark.sql之前，您需要将数据框保存为tempview。

df.createOrReplaceTempView("tbname")

【讨论】：

以上是关于在 jupyter 中运行 spark.sql 查询的主要内容，如果未能解决你的问题，请参考以下文章