BigQuery 为子查询返回 null

Posted

技术标签:

【中文标题】BigQuery 为子查询返回 null【英文标题】:BigQuery returning null for sub-query 【发布时间】:2016-05-09 16:30:31 【问题描述】:

有什么问题?

我的子查询返回 NULL。

我想做什么?

我正在使用如下表格:

------------------------------------------------------------------
|        url          |  page_path_1  |  page_path_2  |  filter  |
------------------------------------------------------------------
|  e.com/test1/test2/  |     test1     |     test2    |   foo    |
|  e.com/test1/test2/  |     test1     |     test2    |   bar    |
|  e.com/test2/test3/  |     test2     |     test3    |   foo    |

我想为按 foo 降序排序的前 20 个目录组合中的每一个返回 20 个示例 url。

我当前的查询是什么?

SELECT
  url
FROM 
  [table.data_analysis],
  (
      SELECT
        page_path_1 as pp1, page_path_2 as pp2, count(page_path_1) as count, filter
      FROM
        [table.data_analysis]
      WHERE
       filter = foo
      GROUP BY pp1, pp2, filter
      ORDER BY count desc
      LIMIT 20
    ) AS sub_query
WHERE
  filter = foo and
  page_path_1 = pp1 and
  page_path_2 = pp2
LIMIT 20

sub_query 如果你自己运行它会返回有效目录:

Row  |   pp1   |  pp2    |   count  |  filter  |
-----------------------------------------------
1    |  test1  |  test2  |    1     |   foo
1    |  test2  |  test3  |    1     |   foo

但是当您将其用作实际的子查询并查看 pp1 和 pp2 时:

SELECT
  pp1, pp2
FROM 
  [table.data_analysis],
  (
      SELECT
        page_path_1 as pp1, page_path_2 as pp2, count(page_path_1) as count, filter
      FROM
        [table.data_analysis]
      WHERE
       filter = foo
      GROUP BY pp1, pp2, filter
      ORDER BY count desc
      LIMIT 20
    ) AS sub_query
WHERE
  filter = foo
LIMIT 20

它返回 null。

Row  |  pp1  |   pp2     
----------------------
1    |  null |   null
2    |  null |   null
3    |  null |   null

我完全被难住了。我怎么用错了?

【问题讨论】:

【参考方案1】:

我怎么用错了?

希望下面的简化示例能让您了解这里出了什么问题

SELECT filter, pp1, pp2
FROM (
  SELECT a, b, filter
  FROM 
    (SELECT 1 AS a, 21 AS b, 'foo' AS filter),
    (SELECT 2 AS a, 22 AS b, 'foo' AS filter),
    (SELECT 3 AS a, 23 AS b, 'foo1' AS filter),
    (SELECT 4 AS a, 24 AS b, 'foo1' AS filter),
    (SELECT 5 AS a, 25 AS b, 'foo' AS filter)
  ), (
  SELECT pp1, pp2, filter
  FROM 
    (SELECT 1 AS pp1, 21 AS pp2, 'foo' AS filter),
    (SELECT 2 AS pp1, 22 AS pp2, 'foo1' AS filter),
    (SELECT 3 AS pp1, 23 AS pp2, 'foo' AS filter),
    (SELECT 4 AS pp1, 24 AS pp2, 'foo' AS filter),
  )
WHERE filter = 'foo'
LIMIT 2

结果是

filter  pp1 pp2  
foo null    null     
foo null    null     

在这里,您使用的是 BigQuery Legacy SQL 的 UNION ALL 的逗号样式 UNION 中的第一部分没有 pp1 和 pp2 所以这就是为什么它们带有 NULLs

【讨论】:

以上是关于BigQuery 为子查询返回 null的主要内容,如果未能解决你的问题,请参考以下文章

BigQuery:需要在相关子查询中返回记录类型的唯一值

为子查询优化 Postgresql 查询

将普通查询转换为子查询会产生不同的结果

如何在不破坏SQL逻辑的情况下将JOINS转换为子查询

BigQuery:按日期将子选择合并为一行

SQL 在 select 子查询中只返回一行或 null