使用 pandas 将一组值传递给 bigquery 查询
Posted
技术标签:
【中文标题】使用 pandas 将一组值传递给 bigquery 查询【英文标题】:pass an array of values into bigquery query with pandas 【发布时间】:2017-07-11 09:35:03 【问题描述】:经过一些处理,我得到以下数组:
users = array([u'5451709866311680', u'4660301072957440', u'6370791394377728',
u'5121933955825664', u'4778500988862464', u'5841867648270336',
u'4751430816628736', u'4869137213947904', u'5152642703556608',
u'6531810976595968', u'4824167228637184', u'6058117842337792',
u'5969360933879808', u'4764494160986112', u'5443041280131072',
u'4846257587617792', u'5409371420884992', u'6197117949313024',
u'6643644022915072', u'5060273861820416'], dtype=object)
然后我想在 bigquery 的另一个表中查询这些用户,但我遇到了问题。
query = """
SELECT *
FROM games
WHERE user_id IN %users
"""
segment = pd.io.gbq.read_gbq(query, project_id='shared', dialect='standard)
有人知道怎么做吗?
谢谢
【问题讨论】:
【参考方案1】:您的查询可能有问题,而不是 pandas。为了使此查询起作用,您必须执行以下操作:
query = """
SELECT *
FROM crozzles.games
WHERE user_id IN UNNEST(['user1', 'user2', 'user3'])
"""
如果您的数组没有UNNEST
,则 BigQuery 无法查找其inner values。
你可以做的一件事是这样的:
query = """
SELECT *
FROM crozzles.games
WHERE user_id IN UNNEST(%s)
""" %(map(str, users))
应该导致:
query = """SELECT *
FROM crozzles.games
WHERE user_id IN UNNEST(['5451709866311680', '4660301072957440', '6370791394377728', '5121933955825664', '4778500988862464', '5841867648270336', '4751430816628736', '4869137213947904', '5152642703556608', '6531810976595968', '4824167228637184', '6058117842337792', '5969360933879808', '4764494160986112', '5443041280131072', '4846257587617792', '5409371420884992', '6197117949313024', '6643644022915072', '5060273861820416'])
【讨论】:
【参考方案2】:这是使用开放数据集bigquery-public-data.github_repos
的一种可能性:
from numpy import array
import pandas as pd
PROJEC_ID = 'choose-your-project-id'
input_array = array(['javascript', 'Python', 'R'], dtype=object)
query = """
SELECT lang.name, COUNT(*) AS count
FROM `bigquery-public-data.github_repos.languages`, UNNEST(language) AS lang
WHERE lang.name IN UNNEST(@lang_names)
GROUP BY 1
ORDER BY 2 DESC;
"""
query_config =
'query':
'parameterMode': 'NAMED',
'queryParameters': [
'name': 'lang_names',
'parameterType': 'type': 'ARRAY',
'arrayType': 'type': 'STRING',
'parameterValue': 'arrayValues': ['value': i for i in input_array]
]
result = pd.io.gbq.read_gbq(query, project_id=PROJEC_ID, dialect='standard',
configuration=query_config)
print(result.to_string())
现在结果是:
name count
0 JavaScript 1109499
1 Python 551257
2 R 29572
参考资料:
-
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#QueryRequest
https://cloud.google.com/bigquery/docs/reference/rest/v2/QueryParameter
【讨论】:
以上是关于使用 pandas 将一组值传递给 bigquery 查询的主要内容,如果未能解决你的问题,请参考以下文章