联接表结果 Google BigQuery

Posted

技术标签:

【中文标题】联接表结果 Google BigQuery【英文标题】:Join table results Google BigQuery 【发布时间】:2017-04-21 02:36:29 【问题描述】:

我有两个 SQL 查询:

SELECT subreddit, count(subreddit) as count
FROM [fh-bigquery:reddit_comments.all] 
where author="***********" GROUP by subreddit ORDER BY count DESC;

SELECT subreddit, count(subreddit) as count
FROM [redditcollaborativefiltering:aggregate_comments.reddit_posts_all]
where author="***********" GROUP by subreddit ORDER BY count DESC;

我希望能够将这两个查询的结果合并为一个具有相同列的结果,但是,计数是彼此相加的。有什么简单的方法可以做到这一点?

【问题讨论】:

【参考方案1】:

对于 BigQuery Legacy SQL(我看到您在示例中使用),您可以在下面使用:

#legacySQL
SELECT subredit, SUM(cnt) as cnt
FROM (SELECT subreddit, COUNT(subreddit) as cnt
       FROM [fh-bigquery:reddit_comments.all] 
       WHERE author = '***********'
       GROUP BY subreddit 
      ),
      (SELECT subreddit, COUNT(subreddit) as cnt
       FROM [redditcollaborativefiltering:aggregate_comments.reddit_posts_all] 
       WHERE author = '***********'
       GROUP by subreddit
      )
GROUP BY subreddit
ORDER BY cnt DESC  

正如您在此处看到的 - Legacy SQL 中的逗号用作 UNION ALL

以上可以进一步简化

#legacySQL
SELECT subreddit, COUNT(subreddit) as cnt
FROM [fh-bigquery:reddit_comments.all],
  [redditcollaborativefiltering:aggregate_comments.reddit_posts_all]
WHERE author = '***********'
GROUP BY subreddit 
ORDER BY cnt DESC

您可以阅读有关 BigQuery 旧版 SQL 的 Comma as UNION ALL 的更多信息

【讨论】:

【参考方案2】:

您可以使用UNION ALL 和另一个聚合:

SELECT subredit, SUM(cnt) as cnt
FROM ((SELECT subreddit, count(subreddit) as cnt
       FROM [fh-bigquery:reddit_comments.all] 
       WHERE author = '***********'
       GROUP BY subreddit 
      ) UNION ALL
      (SELECT subreddit, count(subreddit) as cnt
       FROM [redditcollaborativefiltering:aggregate_comments.reddit_posts_all]
       WHERE author = '***********'
       GROUP by subreddit
      )
     ) sc
GROUP BY subreddit
ORDER BY cnt DESC;

【讨论】:

以上是关于联接表结果 Google BigQuery的主要内容,如果未能解决你的问题,请参考以下文章

优化按联接表中的字段对结果进行分组的查询

如何将 Sqlalchemy ORM 查询结果转换为包含关系的单个联接表?

如何获得 MySQL JOIN 的结果,其中记录符合联接表中的值标准?

数据库查询·联接思维导图&要点&误点(含示例)

内部联接选择结果与表

mySQL 多重联接,结果行为空