BigQuery 错误：无法在重复字段上分区

Posted 2023-03-25

技术标签:

【中文标题】BigQuery 错误：无法在重复字段上分区【英文标题】：BigQuery error: Cannot partition on repeated field 【发布时间】：2015-05-26 18:29:41 【问题描述】：

我有两张表table1（复杂的一张有重复/记录列）和table2（相当简单）。我正在尝试使用以下查询创建一个新表，其中包含来自 table1 的所有列以及来自 table2 的一列：

select t1.id, t1.experience.desc, t1.experience.organization.*, t1.experience.department, t2.field2 as t1.experience.organization.newfield, t1.family_name
from [so_public.table1] as t1 left join each [so_public.table2] as t2
on t1.experience.organization.name = t2.field1

我收到一个错误无法在重复字段上分区，如下图所示。这两个表的架构也显示在各自的图像中。

当一个人想要合并来自两个表的数据时，这里是否有一般的经验法则？我想做的事有可能吗？

实际的表格要复杂得多。我只展示了足以重现问题的上下文。

【问题讨论】：

【参考方案1】：

在加入表格之前，您需要 FLATTEN()。

这不起作用：

SELECT a.fullName, b.fullname
FROM [bigquery-samples:nested.persons_living] a
JOIN [bigquery-samples:nested.persons_living] b
ON a.citiesLived.place=b.citiesLived.place
LIMIT 1000

Error: Cannot join on repeated field citiesLived.place

这样做：

SELECT a.fullName, b.fullname
FROM FLATTEN([bigquery-samples:nested.persons_living], citiesLived) a
JOIN FLATTEN([bigquery-samples:nested.persons_living], citiesLived) b
ON a.citiesLived.place=b.citiesLived.place
LIMIT 1000

【讨论】：

你有可能重写我的查询吗？我试过但我失败了。我收到错误或意外结果。 @wpfwannabe，如果您将示例数据集公开，我可以在发布前重新编写查询并进行测试请查看已编辑的问题。查询现在应该引用公共数据集。请注意，我真正追求的是原始表 + 另一个连接表中的一些列（不仅仅是扁平结果）。【参考方案2】：

使用您在编辑时发布的公共示例，一个有效的查询：

select t1.id, t1.experience.desc, t1.experience.department, t1.experience.organization.*, t2.field2 as t1.experience.organization.newfield, t1.family_name
from FLATTEN(FLATTEN([earnest-stock-91916:so_public.table1], experience.organization), experience) as t1 left join each [earnest-stock-91916:so_public.table2] as t2
on t1.experience.organization.name = t2.field1;

我能够展平数据（必须应用两次），但不能恢复原始结构 - 加入其中一个子行更难。

我知道你想做的是丰富一些子行吗？

【讨论】：

是的，这正是我想要做的。我想使用原始数据/模式并创建一个带有附加列（更丰富的模式）的 new 表。到目前为止，我还无法做到这一点。这似乎是不可能的。为了做到这一点，我必须在数据准备阶段进行模式扩展——在我实际导入 BQ 之前。听起来很糟糕。

以上是关于BigQuery 错误：无法在重复字段上分区的主要内容，如果未能解决你的问题，请参考以下文章