在 BigQuery 上展平多个嵌套数组

Posted

技术标签:

【中文标题】在 BigQuery 上展平多个嵌套数组【英文标题】:Flatten multiple nested arrays on BigQuery 【发布时间】:2021-04-14 02:17:02 【问题描述】:

我有几列嵌套在 bigquery 表中的数组,如下所示:

marketing_table
+------+------------------+------------------+--------------------------------------------+------------------------------------+-------------------------------------+-------------+
|Row   | effective_status | targeting.age_min| targeting.audience_network_positions.value | targeting.facebook_positions.value | targeting.instagram_positions.value | campaign_id |
+------+------------------+------------------+--------------------------------------------+------------------------------------+-------------------------------------+-------------+
|    1 |  Active          |      22          |                classic                     |       feed                         |                 stream              |     1       |
|      |                  |                  |                instream video              |       video_feeds                  |                 story               |             |     
|      |                  |                  |                                            |       instant_article              |                 explore             |             |   
|      |                  |                  |                                            |       instream_video               |                                     |             | 
|      |                  |                  |                                            |       marketplace                  |                                     |             | 
|      |                  |                  |                                            |       story                        |                                     |             |   
|    2 |  WITH_ISSUES     |      22          |                classic                     |       feed                         |                 stream              |     1       |
|      |                  |                  |                instream video              |       video_feeds                  |                 story               |             |     
|      |                  |                  |                                            |       instant_article              |                 explore             |             |   
|      |                  |                  |                                            |       instream_video               |                                     |             | 
|      |                  |                  |                                            |       marketplace                  |                                     |             | 
|      |                  |                  |                                            |       story                        |                                     |             |   
+------+------------------+------------------+--------------------------------------------+------------------------------------+-------------------------------------+-------------+

SQL 方案如下所示:

Field name, Type, Mode
----------------------- 
effective_status, STRING, NULLABLE
targeting. age_min, INTEGER, NULLABLE
targeting. age_min, INTEGER, NULLABLE
targeting.audience_network_positions.value, RECORD, REPEATED
targeting. facebook_positions, RECORD, REPEATED
targeting.facebook_positions.value, STRING, NULLABLE
targeting. instagram_positions, RECORD, REPEATED
targeting.instagram_positions.value, STRING, NULLABLE
campaign_id, STRING, NULLABLE

我希望它展平所有嵌套数组,以便它们产生

marketing_table
+------+------------------+------------------+--------------------------------------------+------------------------------------+-------------------------------------+-------------+
|Row   | effective_status | targeting.age_min| targeting.audience_network_positions.value | targeting.facebook_positions.value | targeting.instagram_positions.value | campaign_id |
+------+------------------+------------------+--------------------------------------------+------------------------------------+-------------------------------------+-------------+
|    1 |  Active          |      22          |                classic                     |       feed                         |                 stream              |     1       |
|    2 |  Active          |      22          |                instream video              |       video_feeds                  |                 story               |     1       |     
|    3 |  Active          |      22          |                instream video              |       instant_article              |                 explore             |     1       |   
|    4 |  Active          |      22          |                instream video              |       instream_video               |                 explore             |     1       | 
|    5 |  Active          |      22          |                instream video              |       marketplace                  |                 explore             |     1       | 
|    6 |  Active          |      22          |                instream video              |       story                        |                 explore             |     1       |   
|    7 |  WITH_ISSUES     |      22          |                classic                     |       feed                         |                 stream              |     1       |
|    8 |  WITH_ISSUES     |      22          |                instream video              |       video_feeds                  |                 story               |     1       |     
|    9 |  WITH_ISSUES     |      22          |                instream video              |       instant_article              |                 explore             |     1       |   
|    10|  WITH_ISSUES     |      22          |                instream video              |       instream_video               |                 explore             |     1       | 
|    11|  WITH_ISSUES     |      22          |                instream video              |       marketplace                  |                 explore             |     1       | 
|    12|  WITH_ISSUES     |      22          |                instream video              |       story                        |                 explore             |     1       |   
+------+------------------+------------------+--------------------------------------------+------------------------------------+-------------------------------------+-------------+

你们能告诉我如何使用 bigquery SQL 上的 unnest 参数正确地取消所有这些数组的嵌套吗?

【问题讨论】:

您能展示一下您的表格的结构吗? targeting.instagram_positions.value 列的预期结果看起来有点奇怪... @SergeyGeron 我已经更新了我的问题,请看一下先生 【参考方案1】:

这样的东西应该可以工作。

select 
t.effective_status, 
t.targeting.age_min,
anp.value as anp_value,
fp.value as fp_value,
ip.value as ip_value,
t.campaign_id

from table t, 
unnest(t.targeting.audience_network_positions) as anp, 
unnest(t.targeting.facebook_positions) fp, 
unnest(t.instagram_positions) as ip

【讨论】:

以上是关于在 BigQuery 上展平多个嵌套数组的主要内容,如果未能解决你的问题,请参考以下文章

Bigquery:UNNEST 重复与展平表性能

在 BigQuery 中展平嵌套层次结构

在 BigQuery 中展平嵌套和重复的结构(标准 SQL)

具有嵌套对象的多个对象的展平数组

将 BigQuery 嵌套字段内容展平为新列而不是行

在 BigQuery 中取消嵌套结构