Google BigQuery 重复字段

Posted

技术标签:

【中文标题】Google BigQuery 重复字段【英文标题】:Google BigQuery Repeated field 【发布时间】:2017-05-23 04:46:48 【问题描述】:

我目前有一个如下所示的表格:

Id // Key
General 
  platformId  
  platformName
Products [
    Repeated Product 
    Country
    URL
      Offers [
         Repeated Offer 
          Type
          Price
          Currency
      ]
    
]

我需要将其转换为不同的格式:

Record ID // Key
Country 
Providers [
  Repeated provider
  platformName
  Offers [
    Repeated Offer 
      Type
      Price
      Currency 
  ]
]

我最初把桌子弄平,得到这样的东西:

id,platformId,platformName,products.product.country,products.product.offers.offer.price,products.product.offers.offer.type,products.product.offers.offer.currency
1,123,AWS,US,1.99,CPU,USD
1,123,AWS,US,1.99,HDD,USD
1,123,AWS,US,1.99,RAM,USD
2,123,AWS,CA,2.99,CPU,CAN
2,123,AWS,CA,2.99,HDD,CAN
2,123,AWS,CA,2.99,RAM,CAN
3,123,GOOG,US,3.99,CPU,GBP
3,123,GOOG,US,3.99,HDD,GBP
3,123,GOOG,US,3.99,RAM,GBP

我想按国家和平台名称对以下字段进行分组:

1,123,AWS,US,1.99,CPU,USD
1,123,AWS,US,1.99,HDD,USD
1,123,AWS,US,1.99,RAM,USD
3,123,GOOG,US,1.99,CPU,GBP
3,123,GOOG,US,1.99,HDD,GBP
3,123,GOOG,US,1.99,RAM,GBP

字段结构应如下所示:

123,US,AWS
        CPU,1.99,USD
        HDD,1.99,USD
        RAM,1.99,USD
       GOOG
        CPU,3.99,USD
        HDD,3.99,USD
        RAM,3.99,USD

有什么建议吗? 目前我无法按国家/地区分组:

+---------+---------------+--------+--------+----------+
| country | platformName  | type   | price  | currency |
+---------+---------------+--------+--------+----------+
| US      | AWS           | CPU    |   1.99 | USD      |
|         |               | HDD    |   1.99 | USD      |
|         |               | RAM    |   1.99 | USD      |
| CA      | AWS           | CPU    |   2.99 | CAN      |
|         |               | HDD    |   2.99 | CAN      |
|         |               | RAM    |   2.99 | CAN      |
| US      | GOOG          | CPU    |   3.99 | USD      |
|         |               | HDD    |   3.99 | USD      |
|         |               | RAM    |   3.99 | USD      |
--------------------------------------------------------

这是我的查询

SELECT    
  country,
  platformName,
  NEST(type) AS type,
  NEST(price) AS price,
  CASE         
        WHEN NEST(currency) = '' THEN NULL         
        ELSE NEST(currency) 
  END AS currency,
FROM 
  tbl
WHERE
  master_id = 123 
GROUP BY 
  platform_name,
  country

【问题讨论】:

【参考方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
SELECT product.country, general.platformName, ARRAY_AGG(offer) AS offers
FROM data, UNNEST(products) AS product, UNNEST(offers) AS offer
WHERE id = 123
GROUP BY product.country, general.platformName

希望我的架构正确

我不断得到:UNNEST 中引用的值必须是要约的数组。

这完全是 100% 正确的。正如我所提到的 - 我希望我得到了正确的架构。 所以上面的查询适用于如下模式(我认为它代表了你提出的问题)

您可以使用以下虚拟数据对其进行测试:

#standardSQL
WITH data AS (
  SELECT 1 AS Id, 
    STRUCT<platformId INT64, platformName STRING>(123, 'name 1') AS general,
    ARRAY<STRUCT<country STRING, url STRING, offers ARRAY<STRUCT<type STRING, price FLOAT64, currentcy STRING>>>>
    [
      ('US', 'google.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 1', 1.99, 'USD'), ('offer 2', 2.99, 'USD'),('offer 3', 3.99, 'USD')]),
      ('CA', 'yahoo.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 4', 1.99, 'USD'), ('offer 5', 2.99, 'USD')]),
      ('EU', 'apple.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 6', 1.99, 'USD')])
    ] AS products UNION ALL
  SELECT 2 AS Id, 
    STRUCT<platformId INT64, platformName STRING>(123, 'name 2') AS general,
    ARRAY<STRUCT<country STRING, url STRING, offers ARRAY<STRUCT<type STRING, price FLOAT64, currentcy STRING>>>>
    [
      ('US', 'google.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 7', 1.99, 'USD'), ('offer 8', 2.99, 'USD'),('offer 9', 3.99, 'USD')]),
      ('MX', 'yahoo.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 10', 1.99, 'USD'), ('offer 11', 2.99, 'USD')]),
      ('CA', 'apple.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 12', 1.99, 'USD')])
    ] AS products 
)
SELECT product.country, general.platformName, ARRAY_AGG(offer) AS offers
FROM data, UNNEST(products) AS product, UNNEST(offers) AS offer
WHERE id = 1
GROUP BY product.country, general.platformName

产生如下结果

当然,如果您的真实架构不同 - 您应该深入挖掘并尝试根据您的特定情况进行调整。我希望你会这样做 :o)

【讨论】:

我的表名是:tbl,我不断得到:UNNEST中引用的值必须是offer的数组。 非常感谢您的详细回复,现在可以使用了!

以上是关于Google BigQuery 重复字段的主要内容,如果未能解决你的问题,请参考以下文章

Google BigQuery 重复字段

在 Google BigQuery 中展平多个重复字段

在 Google BigQuery 中创建 CSV 表的困难

sql [BigQuery - Facebook产品目录]查询para obtenerelcatálogodeproductos de Kichink。 #facebook #bigqu

从Google BigQuery中的嵌套表中删除重复项

GoogleApiException:流式传输到 BigQuery 时,Google.Apis.Requests.RequestError 后端错误 [500]