Azure Databricks SparkSQL 是不是支持递归查询

Posted

技术标签:

【中文标题】Azure Databricks SparkSQL 是不是支持递归查询【英文标题】:Does Azure Databricks SparkSQL support recursive queriesAzure Databricks SparkSQL 是否支持递归查询 【发布时间】:2019-10-07 22:09:14 【问题描述】:

我正在将数据从 SQL Server 移动到 Azure 数据湖 gen2,并使用递归查询转换 SQL 查询。

这是一个使用 CTE(公用表表达式)进行递归的示例 SQL 查询

 WITH RECURSIVE AS BOM
          (SELECT p.MItemId AS RootPartNumber,
                  p.MItemId AS PartNumber,
                  NULL AS ParentPartNumber,
                  0    AS BomLevel,
                  1.0  AS Quantity
           FROM   PartItem p

           UNION ALL
           SELECT BOM.RootPartNumber,
                 CAST(BSM.ChildItem AS string) AS PartNumber,
                 CAST(DB.PartNumber AS string) AS ParentPartNumber,
                 BOM.BomLevel + 1  as BomLevel,
                 BSM.Quantity AS Quantity 
           FROM  PartItemBomList BSM
           INNER JOIN BOM  ON BOM.PartNumber = BSM.ParentItem
           INNER JOIN PartItem p           ON p.MItemId = BSM.ChildItem
           WHERE BSM.IsDeleted = 0 
  )
  SELECT * FROM BOM

我尝试更改在 FROM 子句中嵌入递归的查询,如下所示,但没有成功。

 SELECT * FROM 
          (SELECT p.MItemId AS RootPartNumber,
                  p.MItemId AS PartNumber,
                  NULL AS ParentPartNumber,
                  0    AS BomLevel,
                  1.0  AS Quantity
           FROM   PartItem p
           WHERE p.PartType =    'Cloud-OrderableAssembly' 
           UNION ALL
           SELECT BOM.RootPartNumber,
                 CAST(BSM.ChildItem AS string) AS PartNumber,
                 CAST(DB.PartNumber AS string) AS ParentPartNumber,
                 BOM.BomLevel + 1  as BomLevel,
                 BSM.Quantity AS Quantity 
           FROM  PartItemBomList BSM
           INNER JOIN BOM  ON BOM.PartNumber = BSM.ParentItem
           INNER JOIN PartItem p           ON p.MItemId = BSM.ChildItem
           WHERE BSM.IsDeleted = 0 
  ) as BOM

这是我从 Azure Databricks 会话中得到的错误。

SQL 语句出错:AnalysisException:找不到表或视图:BOM;第 16 行第 22 行

【问题讨论】:

嗨@SQLSylvia,您是否找到了针对 databricks/spark SQL 的递归查询的解决方案或解决方法? 【参考方案1】:

问题来了

INNER JOIN BOM  ON BOM.PartNumber = BSM.ParentItem

这是内部查询,据我所知,BOM 是在外部定义的,因此这部分查询运行 BOM 不存在。

如果我是你,我可以尝试通过直接在 SQL 上运行来修复以下查询。 下面引用BOM的方式不正确

SELECT p.MItemId AS RootPartNumber, p.MItemId AS PartNumber, NULL AS ParentPartNumber, 0 AS BomLevel, 1.0 AS Quantity FROM PartItem p WHERE p.PartType = 'Cloud-OrderableAssembly' UNION ALL SELECT BOM.RootPartNumber, CAST(BSM.ChildItem AS string) AS PartNumber, CAST(DB.PartNumber AS string) AS ParentPartNumber, BOM.BomLevel + 1 as BomLevel, BSM.Quantity AS Quantity FROM PartItemBomList BSM INNER JOIN BOM ON BOM.PartNumber = BSM.ParentItem INNER JOIN PartItem p ON p.MItemId = BSM.ChildItem WHERE BSM.IsDeleted = 0

【讨论】:

以上是关于Azure Databricks SparkSQL 是不是支持递归查询的主要内容,如果未能解决你的问题,请参考以下文章

如何将大量 DDL 从 Dev 导入/重新创建到 Azure Databricks 的 Prod 实例

Databricks/Spark SQL 中的反透视表

如何计算 Spark SQL(Databricks)中表中的列数?

使用 azure databricks 读取 azure databricks 日志 json 文件

Azure Databricks - 解释 databricks 中的安装语法

我们可以在 Azure 映射数据流中转置数据吗?