具有集群和分区的表上的 Bigquery SQL 性能问题

Posted 2023-03-24

技术标签:

【中文标题】具有集群和分区的表上的 Bigquery SQL 性能问题【英文标题】：Bigquery SQL Performance issue on tables with cluster and partition 【发布时间】：2020-12-02 04:29:21 【问题描述】：

我在两个带有 where 子句的表之间有一个简单的连接。这两个表都在inserted_date 上进行了分区，并在 numeric_id 列上使用了集群。即使建议了所有性能调整，查询也需要 30 秒，而对于不同的集群列，我能得到的最好的结果是运行 22 秒。不知道我能做些什么来提高性能。

注意：table_1 很长，大约有 100 列

表_1 表大小 - 121.18 MB 行 - 279,567

表_2 表大小 - 148 MB 行 - 864,177

    select q.*,a.product_id
    from table_1 qav
    inner join table_2 a on a.id = qav.application_id
    where a.product_id in (1,5,7,9)

【问题讨论】：

【参考方案1】：

尝试将 a.product_id in (1,5,7,9) 从 where 移动到 on 和 start with the largest table：

select q.*,a.product_id 
from table_2 a inner join table_1 qav
on a.id = qav.application_id 
   and a.product_id in (1,5,7,9)

【讨论】：

感谢您的意见。不幸的是，将过滤器移动到连接只节省了 2-3 秒，而且没有太大的收获。

以上是关于具有集群和分区的表上的 Bigquery SQL 性能问题的主要内容，如果未能解决你的问题，请参考以下文章