Azure Databricks SparkSQL 是不是支持递归查询
Posted
技术标签:
【中文标题】Azure Databricks SparkSQL 是不是支持递归查询【英文标题】:Does Azure Databricks SparkSQL support recursive queriesAzure Databricks SparkSQL 是否支持递归查询 【发布时间】:2019-10-07 22:09:14 【问题描述】:我正在将数据从 SQL Server 移动到 Azure 数据湖 gen2,并使用递归查询转换 SQL 查询。
这是一个使用 CTE(公用表表达式)进行递归的示例 SQL 查询
WITH RECURSIVE AS BOM
(SELECT p.MItemId AS RootPartNumber,
p.MItemId AS PartNumber,
NULL AS ParentPartNumber,
0 AS BomLevel,
1.0 AS Quantity
FROM PartItem p
UNION ALL
SELECT BOM.RootPartNumber,
CAST(BSM.ChildItem AS string) AS PartNumber,
CAST(DB.PartNumber AS string) AS ParentPartNumber,
BOM.BomLevel + 1 as BomLevel,
BSM.Quantity AS Quantity
FROM PartItemBomList BSM
INNER JOIN BOM ON BOM.PartNumber = BSM.ParentItem
INNER JOIN PartItem p ON p.MItemId = BSM.ChildItem
WHERE BSM.IsDeleted = 0
)
SELECT * FROM BOM
我尝试更改在 FROM 子句中嵌入递归的查询,如下所示,但没有成功。
SELECT * FROM
(SELECT p.MItemId AS RootPartNumber,
p.MItemId AS PartNumber,
NULL AS ParentPartNumber,
0 AS BomLevel,
1.0 AS Quantity
FROM PartItem p
WHERE p.PartType = 'Cloud-OrderableAssembly'
UNION ALL
SELECT BOM.RootPartNumber,
CAST(BSM.ChildItem AS string) AS PartNumber,
CAST(DB.PartNumber AS string) AS ParentPartNumber,
BOM.BomLevel + 1 as BomLevel,
BSM.Quantity AS Quantity
FROM PartItemBomList BSM
INNER JOIN BOM ON BOM.PartNumber = BSM.ParentItem
INNER JOIN PartItem p ON p.MItemId = BSM.ChildItem
WHERE BSM.IsDeleted = 0
) as BOM
这是我从 Azure Databricks 会话中得到的错误。
SQL 语句出错:AnalysisException:找不到表或视图:BOM;第 16 行第 22 行
【问题讨论】:
嗨@SQLSylvia,您是否找到了针对 databricks/spark SQL 的递归查询的解决方案或解决方法? 【参考方案1】:问题来了
INNER JOIN BOM ON BOM.PartNumber = BSM.ParentItem
这是内部查询,据我所知,BOM 是在外部定义的,因此这部分查询运行 BOM 不存在。
如果我是你,我可以尝试通过直接在 SQL 上运行来修复以下查询。 下面引用BOM的方式不正确
SELECT p.MItemId AS RootPartNumber,
p.MItemId AS PartNumber,
NULL AS ParentPartNumber,
0 AS BomLevel,
1.0 AS Quantity
FROM PartItem p
WHERE p.PartType = 'Cloud-OrderableAssembly'
UNION ALL
SELECT BOM.RootPartNumber,
CAST(BSM.ChildItem AS string) AS PartNumber,
CAST(DB.PartNumber AS string) AS ParentPartNumber,
BOM.BomLevel + 1 as BomLevel,
BSM.Quantity AS Quantity
FROM PartItemBomList BSM
INNER JOIN BOM ON BOM.PartNumber = BSM.ParentItem
INNER JOIN PartItem p ON p.MItemId = BSM.ChildItem
WHERE BSM.IsDeleted = 0
【讨论】:
以上是关于Azure Databricks SparkSQL 是不是支持递归查询的主要内容,如果未能解决你的问题,请参考以下文章
如何将大量 DDL 从 Dev 导入/重新创建到 Azure Databricks 的 Prod 实例
如何计算 Spark SQL(Databricks)中表中的列数?
使用 azure databricks 读取 azure databricks 日志 json 文件