Python Google BigQuery 参数化 SELECT
Posted
技术标签:
【中文标题】Python Google BigQuery 参数化 SELECT【英文标题】:Python Google BigQuery Paramaterized SELECT 【发布时间】:2018-08-30 19:29:17 【问题描述】:我遇到了 BigQuery 参数化问题。我将开始日期和结束日期以及数据库中存在的一组潜在字段传递给函数。开始和结束日期格式为“yyyymmdd”。
目标是能够传递一组日期和一组字段,并收集与两个日期之间的字段数组相关的数据。
日期操作按预期工作。
字段数组的传递方式如下:["user_pseudo_id", "event_name", "event_timestamp"] 作为示例(数组中的其他条目可能)
实际上,我想进一步参数化查询,使其看起来类似于下面,其中@search_params 替换查询的 SELECT 部分中的各个变量。目的是让 fields 数组更具可扩展性,从单个条目到多个条目。
从我的搜索来看,我相信 ArrayQueryParameter(代替 ScalarQueryParameter)可以解决这个问题,但我没有找到太多的使用文档。
query_job = client.query("""
SELECT @search_params, _TABLE_SUFFIX AS suffix
FROM `analytics_180354243.events_*`
WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
BETWEEN @start_date AND @end_date
""", job_config=job_config)
下面的完整功能
def query_awe(start_date, end_date, fields):
credentials = service_account.Credentials.from_service_account_file('auth.json')
project_id = 'my-project-id'
client = bigquery.Client(credentials=credentials, project=project_id)
search_params = ""
for i in fields:
search_params += i + ", "
search_params = search_params[:-2]
query_params = [
bigquery.ScalarQueryParameter('start_date', 'STRING', start_date),
bigquery.ScalarQueryParameter('end_date', 'STRING', end_date),
bigquery.ScalarQueryParameter('search_params', 'STRING', search_params),
]
bigquery.ArrayQueryParameter
job_config = bigquery.QueryJobConfig()
job_config.use_legacy_sql = False
job_config.query_parameters = query_params
query_job = client.query("""
SELECT user_pseudo_id, event_name, _TABLE_SUFFIX AS suffix
FROM `analytics_180354243.events_*` #Each day saved as events_yyyymmdd
WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
BETWEEN @start_date AND @end_date
ORDER BY user_pseudo_id DESC
""", job_config=job_config)
results = query_job.result() # Waits for job to complete.
for row in results:
print(row)
【问题讨论】:
【参考方案1】:只使用字符串格式呢?
def query_awe(start_date, end_date, fields):
credentials = service_account.Credentials.from_service_account_file('auth.json')
project_id = 'my-project-id'
client = bigquery.Client(credentials=credentials, project=project_id)
job_config = bigquery.QueryJobConfig()
job_config.use_legacy_sql = False
my_query = """
SELECT 0, _TABLE_SUFFIX AS suffix
FROM `analytics_180354243.events_*` #Each day saved as events_yyyymmdd
WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
BETWEEN 1 AND 2
ORDER BY user_pseudo_id DESC
"""
my_query = my_query.format(', '.join(fields), start_date, end_date)
query_job = client.query(my_query, job_config=job_config)
results = query_job.result() # Waits for job to complete.
for row in results:
print(row)
【讨论】:
【参考方案2】:我只是使用一个简单的替换功能来完成您的要求。
myQuery = """SELECT <var_search_params>, _TABLE_SUFFIX AS suffix
FROM `analytics_180354243.events_*`
WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
BETWEEN @start_date AND @end_date
"""
myQuery.replace("<var_search_params>", "Foo, Bar")
query_job = client.query(myQuery, job_config=job_config)
【讨论】:
以上是关于Python Google BigQuery 参数化 SELECT的主要内容,如果未能解决你的问题,请参考以下文章
Google Bigquery - 运行参数化查询 - php
将参数从 Google Cloud 函数 GET 请求传递到 BigQuery
Google BigQuery:通过 Python google-cloud-bigquery 版本 0.27.0 与 0.28.0 创建视图