如何在 Google Big Query 中正确使用 GROUP BY 命令?

Posted

技术标签:

【中文标题】如何在 Google Big Query 中正确使用 GROUP BY 命令?【英文标题】:How to use GROUP BY command properly in Google Big Query? 【发布时间】:2015-11-23 09:57:22 【问题描述】:

我在尝试仅获取特定数据时遇到了一些问题。首先我不知道如何创建一个 sql 查询(当前的 sql 查询我只能抓取一个用户)所以我可以像这样抓取数据。

其次,我想获取当前日期之前的 1 年数据。以下是我目前完成的 sql 查询(我需要一个一个手动完成)。

SELECT type,  COUNT(*) FROM (
  TABLE_DATE_RANGE([githubarchive:day.events_], 
    TIMESTAMP('2013-1-01'), 
    TIMESTAMP('2015-08-28')
  )) AS events
WHERE type IN ("CommitCommentEvent","CreateEvent","DeleteEvent","DeploymentEvent","DeploymentStatusEvent","DownloadEvent","FollowEvent",
"ForkEvent","ForkApplyEvent","GistEvent","GollumEvent","IssueCommentEvent","IssuesEvent","MemberEvent","MembershipEvent","PageBuildEvent",
"PublicEvent","PullRequestEvent","PullRequestReviewCommentEvent","PushEvent","ReleaseEvent","RepositoryEvent","StatusEvent","TeamAddEvent",
"WatchEvent") AND actor.login = "datomnurdin"
GROUP BY type;

参考:

https://www.githubarchive.org/

https://github.com/igrigorik/githubarchive.org

【问题讨论】:

有什么问题? 我想创建一个sql查询来生成上面想要的输出。 您已经有一个 sql 查询。你能具体说明你的问题是什么吗? 您似乎需要“透视”您的数据。首先,更改您的查询,使actor.login 在您的SELECTGROUP BY,而不是WHERE 子句中。然后搜索“How to Pivot in Big Query”(对不起,我自己不知道语法)。 Hth @Dato'MohammadNurdin 您的查询查找一个 actor.login,这就是为什么它只给出一个 .... 删除“和 actor.login =”,而是通过该参数添加一个组。检查我的答案,应该是你想要的查询 【参考方案1】:

以下是正确旋转数据的方法:

SELECT actor.login,
ifnull(sum(if(type='CommitCommentEvent',1,null)),0) as CommitCommentEvent,
ifnull(sum(if(type='CreateEvent',1,null)),0) as CreateEvent,
ifnull(sum(if(type='DeleteEvent',1,null)),0) as  DeleteEvent,
ifnull(sum(if(type='DeploymentEvent',1,null)),0) as  DeploymentEvent,
ifnull(sum(if(type='DeploymentStatusEvent',1,null)),0) as  DeploymentStatusEvent,
ifnull(sum(if(type='DownloadEvent',1,null)),0) as  DownloadEvent,
ifnull(sum(if(type='FollowEvent',1,null)),0) as  FollowEvent,
ifnull(sum(if(type='ForkEvent',1,null)),0) as  ForkEvent,
ifnull(sum(if(type='ForkApplyEvent',1,null)),0) as  ForkApplyEvent,
ifnull(sum(if(type='GistEvent',1,null)),0) as  GistEvent,
ifnull(sum(if(type='GollumEvent',1,null)),0) as  GollumEvent,
ifnull(sum(if(type='IssueCommentEvent',1,null)),0) as  IssueCommentEvent,
ifnull(sum(if(type='IssuesEvent',1,null)),0) as  IssuesEvent,
ifnull(sum(if(type='MemberEvent',1,null)),0) as  MemberEvent,
ifnull(sum(if(type='MembershipEvent',1,null)),0) as  MembershipEvent,
ifnull(sum(if(type='PageBuildEvent',1,null)),0) as  PageBuildEvent,
ifnull(sum(if(type='PublicEvent',1,null)),0) as  PublicEvent,
ifnull(sum(if(type='PullRequestEvent',1,null)),0) as  PullRequestEvent,
ifnull(sum(if(type='PullRequestReviewCommentEvent',1,null)),0) as  PullRequestReviewCommentEvent,
ifnull(sum(if(type='PushEvent',1,null)),0) as  PushEvent,
ifnull(sum(if(type='ReleaseEvent',1,null)),0) as  ReleaseEvent,
ifnull(sum(if(type='RepositoryEvent',1,null)),0) as  RepositoryEvent,
ifnull(sum(if(type='StatusEvent',1,null)),0) as  StatusEvent,
ifnull(sum(if(type='TeamAddEvent',1,null)),0) as  TeamAddEvent,
ifnull(sum(if(type='WatchEvent',1,null)),0) as  WatchEvent,
FROM (
  TABLE_DATE_RANGE([githubarchive:day.events_], 
    DATE_ADD(CURRENT_TIMESTAMP(), -1, "YEAR"),
    CURRENT_TIMESTAMP()
  )) AS events
WHERE type IN ("CommitCommentEvent","CreateEvent","DeleteEvent","DeploymentEvent","DeploymentStatusEvent","DownloadEvent","FollowEvent",
"ForkEvent","ForkApplyEvent","GistEvent","GollumEvent","IssueCommentEvent","IssuesEvent","MemberEvent","MembershipEvent","PageBuildEvent",
"PublicEvent","PullRequestEvent","PullRequestReviewCommentEvent","PushEvent","ReleaseEvent","RepositoryEvent","StatusEvent","TeamAddEvent",
"WatchEvent")
GROUP BY 1
limit 100

【讨论】:

如何获取当前日期前 1 年的数据并仅找到特定位置(“马来西亚”)?我快到了…… 这应该进入另一个问题,因为我在这些旁边看不到任何位置数据。那是 AFAIK 是一个隐私问题。 我明白了。 “约会”这件事怎么样?

以上是关于如何在 Google Big Query 中正确使用 GROUP BY 命令?的主要内容,如果未能解决你的问题,请参考以下文章

使用 Google Big Query 构建基本漏斗

如何将 Google Cloud SQL 与 Google Big Query 集成

Google Big Query Error: CSV table 遇到太多错误,放弃。行:1 错误:1

如何通过 Google 表格中的二维数组通过 Apps 脚本插入 Big Query?

如何在 Google Big Query 中的多个列上执行模式功能

如何在 google Big Query 上添加页面浏览量维度?