Google BigQuery:具有重复名称的联接表的所有列的前缀

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Google BigQuery:具有重复名称的联接表的所有列的前缀相关的知识,希望对你有一定的参考价值。

在Google BigQuery(使用#standardSQL)上,当两个表之间存在连接时,我需要将固定前缀应用于每个表的所有列。

这是场景,我有这样的结构:

#standardSQL
WITH user AS (
  SELECT "john" as name, "smith" as surname, 1 as parent
  UNION ALL
  SELECT "maggie" as name, "smith" as surname, 2 as parent
),

parent AS (
  SELECT 1 as id, "john" as name, "doe" as surname
  UNION ALL
  SELECT 2 as id, "jane" as name, "smith" as surname
)

用户表

+-----+--------+---------+--------+
| Row |  name  | surname | parent |
+-----+--------+---------+--------+
|   1 | john   | smith   |      1 |
|   2 | maggie | smith   |      2 |
+-----+--------+---------+--------+

父表

+-----+----+------+---------+
| Row | id | name | surname |
+-----+----+------+---------+
|   1 |  1 | john | doe     |
|   2 |  2 | jane | smith   |
+-----+----+------+---------+

像这样的查询

SELECT u.*, p.* FROM user u JOIN parent p ON u.parent = p.id

产生以下错误

Error: Duplicate column names in the result are not supported. Found duplicate(s): name, surname

我想避免像这样执行表的自定义别名

SELECT
  u.name as user_name,
  u.surname as user_surname,
  p.name as parent_name,
  p.surname as parent_surname
FROM user u JOIN parent p ON u.parent = p.id

+-----+-----------+--------------+-------------+----------------+
| Row | user_name | user_surname | parent_name | parent_surname |
+-----+-----------+--------------+-------------+----------------+
|   1 | john      | smith        | john        | doe            |
|   2 | maggie    | smith        | jane        | smith          |
+-----+-----------+--------------+-------------+----------------+

如果表将在字段上更改,我将每次都需要编辑语句(或语句)以便应用具有给定前缀的新字段。因此,使用固定列名称的这种方法不是一种合适的方法

有没有办法,一个查询运算符,为了获得那里提到的表,自动应用前缀?就像是:

SELECT u.* AS user_*, p.* AS parent_*
FROM user u JOIN parent p ON u.parent = p.id
答案

到目前为止,我能想到的唯一选择如下

#standardSQL
WITH user AS (
  SELECT "john" AS name, "smith" AS surname, 1 AS parent UNION ALL
  SELECT "maggie" AS name, "smith" AS surname, 2 AS parent
), parent AS (
  SELECT 1 AS id, "john" AS name, "doe" AS surname UNION ALL
  SELECT 2 AS id, "jane" AS name, "smith" AS surname   
)
SELECT user, parent  
FROM user  
JOIN parent 
ON user.parent = parent.id  

结果为

Row user.name   user.surname    user.parent parent.id   parent.name parent.surname   
1   john        smith           1           1           john        doe  
2   maggie      smith           2           2           jane        smith   

它并不完全符合您的预期,但最接近它,因为它将各个连接表中的每一行包装到相应的STRUCT中 - 例如:

{
"user": {"name": "john", "surname": "smith","parent": "1"},
"parent": {"id": "1","name": "john","surname": "doe"}
}

以上是关于Google BigQuery:具有重复名称的联接表的所有列的前缀的主要内容,如果未能解决你的问题,请参考以下文章

使用 Google 表格作为具有重复字段的 BigQuery 数据源

PHP 和 Google API(具体为 BigQuery)[重复]

在 Google bigquery 中加入 3 个表

具有内部联接的重复列

BigQuery:加入集群字段

ImportError:无法从“google.cloud”(未知位置)导入名称“bigquery”