通过 dbplyr/bigRquery 将 summarise() 调用中的分位数返回到 BigQuery SQL 数据库
Posted
技术标签:
【中文标题】通过 dbplyr/bigRquery 将 summarise() 调用中的分位数返回到 BigQuery SQL 数据库【英文标题】:Return quantiles within a summarise() call through dbplyr/bigRquery to BigQuery SQL database 【发布时间】:2020-07-24 19:39:09 【问题描述】:我正在尝试获取分组 BigQuery 表中变量的分位数,但出现以下错误:
Error: Job 'xxxxx' failed
Syntax error: Expected end of input but got keyword WITHIN at [1:45] [invalidQuery]
Reprex 在下面。
# NOTE: for reprex to work, you must have BIGQUERY_TEST_PROJECT envvar set to name of project which has billing set up and to which you have write access
library(DBI)
library(bigrquery)
library(dplyr)
billing <- bq_test_project()
con <- dbConnect(
bigrquery::bigquery(),
project = "publicdata",
dataset = "samples",
billing = billing
)
natality <- tbl(con, "natality")
natality %>%
group_by(year) %>%
summarize(q25 = quantile(weight_pounds,0.25),
q50 = median(weight_pounds),
q75 = quantile(weight_pounds,0.75)
)
任何人都知道一种解决方法,也许是通过在summarise()
调用中通过sql()
提供SQL 代码?
谢谢!
【问题讨论】:
您找到解决方法了吗?我面临着和你完全相同的问题。 中位数也给了我同样的错误...... @Ploulack 查看下面的答案 【参考方案1】:一位同事通过在summarize()
调用中使用sql()
提供SQL 代码找到了答案:
# NOTE: for reprex to work, you must have BIGQUERY_TEST_PROJECT envvar set to name of project which has billing set up and to which you have write access
library(DBI)
library(bigrquery)
library(dplyr)
billing <- bq_test_project()
con <- dbConnect(
bigrquery::bigquery(),
project = "publicdata",
dataset = "samples",
billing = billing
)
natality <- tbl(con, "natality")
natality %>%
group_by(year) %>%
summarize(q25 = sql("approx_quantiles(weight_pounds,4)[offset(1)]"),
q50 = sql("approx_quantiles(weight_pounds,2)[offset(1)]"),
q75 = sql("approx_quantiles(weight_pounds,4)[offset(3)]")
)
【讨论】:
以上是关于通过 dbplyr/bigRquery 将 summarise() 调用中的分位数返回到 BigQuery SQL 数据库的主要内容,如果未能解决你的问题,请参考以下文章
Excel 集成为 Sum 提供 0,但通过指定值来添加单元格的 #Value :(