减少查询大小：根据最新日期将新数据附加到 Big Query 表

Posted 2023-03-24

技术标签:

【中文标题】减少查询大小：根据最新日期将新数据附加到 Big Query 表【英文标题】：Decrease query size: Append new data to Big Query table based on latest date 【发布时间】：2020-06-18 09:44:47 【问题描述】：

我想通过在 Google Big Query 中使用预定查询，每天将新数据从 table_a 追加到 table_b。请参见下面的示例。

但是，当我编写附加到 table_b 的 SELECT 语句时，查询大小变得非常大。

如果我将动态查询与我使用固定日期的查询（这里：30.05.2020）进行比较，查询大小会增加十倍：

SELECT
date_a as date_b,
col1,
col2,
col3 
FROM `project.dataset.table_a`
WHERE
  date_a > (select max(date_b) from `project.dataset.table_b`)
ORDER by date_b

table_a（在 date_a 上分区；总大小超过 200GB）（此示例已简化，因为我在查询中执行其他 JOINs 和 UNIONs 等）

+------------+---------+---------+------+--------------+--------------+
|    date_a  |  col1   |  col2   | col3 |    others    |    others    |
+------------+---------+---------+------+--------------+--------------+
| 27.05.2020 | henry   | muller  | 100$ | not relevant | not relevant |
| 28.05.2020 | jamie   | fox     | 200$ | not relevant | not relevant |
| 29.05.2020 | richard | branson | 20$  | not relevant | not relevant |
| 30.05.2020 | jannet  | jackson | 50$  | not relevant | not relevant |
| 31.05.2020 | michael | jackson | 90$  | not relevant | not relevant |
+------------+---------+---------+------+--------------+--------------+

table_b（未分区，总大小小于 50MB）

+------------+---------+---------+------+
|    date_b  |  col1   |  col2   | col3 |
+------------+---------+---------+------+
| 27.05.2020 | henry   | muller  | 100$ |
| 28.05.2020 | jamie   | fox     | 200$ |
| 29.05.2020 | richard | branson | 20$  |
| 30.05.2020 | jannet  | jackson | 50$  |
+------------+---------+---------+------+

问题：

您有什么建议可以告诉我如何读出 MAX(date_b) 之前的值或其他减少查询的建议吗？

【问题讨论】：

table_a 是否在 date_a 上分区？是的，它已分区。你读过这篇关于优化查询性能和降低成本的guide吗？ 【参考方案1】：

我的问题已在另一个我以前没见过的帖子中得到解决和回答。

这是另一个问题：How to choose the latest partition in BigQuery table?

这里是谷歌在这个问题上的“官方”回答/帮助：https://cloud.google.com/bigquery/docs/querying-partitioned-tables#pseudo_column_queries_that_scan_all_partitions

【讨论】：

以上是关于减少查询大小：根据最新日期将新数据附加到 Big Query 表的主要内容，如果未能解决你的问题，请参考以下文章

我可以使用浏览器工具将数据附加到 Google Big 查询中的现有表吗？

将新的 Google 表格数据附加到 BigQuery 表中

查询表名附加日期

如何将新的 JVM 附加到生成的 Python 进程？

根据表单字段中提供的日期查询 MS Access 中的最新记录

如何在每次上传桶时更新Big Query后端数据