如何将 BigQuery 上的记录从不同行更改为一行?
Posted
技术标签:
【中文标题】如何将 BigQuery 上的记录从不同行更改为一行?【英文标题】:How to change records on BigQuery from different rows to one row? 【发布时间】:2017-05-09 06:20:28 【问题描述】:我已将值从 JSON 文件插入 BigQuery,但我的 JSON 文件有多个对象。
例如:
"A":"queryID": "newId", "newCol": "newCol"
"B":"date":"2013-05-31 20:56:41", "device":"pc"
"C":"keyword": ["new", "ict"]
BigQuery 的结果是每个对象一行,其他对象为空行。我该如何做才能使它们在不同列的一行中全部显示出来?
def loadTable(http, service):
url = "https://www.googleapis.com/upload/bigquery/v2/projects/" + projectId + "/jobs"
newresource = ('--xxx\n' +
'Content-Type: application/json; charset=UTF-8\n' + '\n' +
'\n' +
' "configuration": \n' +
' "load": \n' +
' "sourceFormat": "NEWLINE_DELIMITED_JSON",\n' +
' "autodetect": "' + "True" + '",\n' +
' "destinationTable": \n' +
' "projectId": "' + projectId + '",\n' +
' "datasetId": "' + datasetId + '",\n' +
' "tableId": "' + tableId + '"\n' +
' \n' +
' \n' +
' \n' +
'\n' +
'--xxx\n' +
'Content-Type: application/octet-stream\n' +
'\n')
f = open('samplejson.json', 'r')
newresource += f.read().replace('\n', '\r\n')
newresource += ('--xxx--\n')
print newresource
headers = 'Content-Type': 'multipart/related; boundary=xxx'
resp, content = http.request(url, method="POST", body=newresource, headers=headers)
if not resp.status == 200:
print resp
print content
else:
jsonResponse = json.loads(content)
jobReference = jsonResponse['jobReference']['jobId']
while True:
jobCollection = service.jobs()
getJob = jobCollection.get(projectId=projectId, jobId=jobReference).execute()
currentStatus = getJob['status']['state']
if 'DONE' == currentStatus:
print "Done Loading!"
return
else:
print 'Waiting to load...'
print 'Current status: ' + currentStatus
print time.ctime()
time.sleep(10)
def main(argv):
credentials = ServiceAccountCredentials.from_json_keyfile_name("samplecredentials.json")
scope = ['https://www.googleapis.com/auth/bigquery']
credentials = credentials.create_scoped(scope)
http = httplib2.Http()
http = credentials.authorize(http)
service = build('bigquery','v2', http=http)
loadTable(http, service)
【问题讨论】:
【参考方案1】:我建议使用以下类型的查询(BigQuery 标准 SQL)将最后的“组装”成一行
#standardSQL
SELECT
ARRAY_AGG(A IGNORE NULLS) AS A,
ARRAY_AGG(B IGNORE NULLS) AS B,
ARRAY_AGG(C IGNORE NULLS) AS C
FROM `yourtable`
如果您有一些额外的字段来指示将哪些行组合/分组为一个 - 例如一些 id - 查询可以如下所示
#standardSQL
SELECT
id,
ARRAY_AGG(A IGNORE NULLS) AS A,
ARRAY_AGG(B IGNORE NULLS) AS B,
ARRAY_AGG(C IGNORE NULLS) AS C
FROM `yourtable`
GROUP BY id
【讨论】:
以上是关于如何将 BigQuery 上的记录从不同行更改为一行?的主要内容,如果未能解决你的问题,请参考以下文章
BigQuery:如何将我的一列的类型从 INTEGER 更改为 STRING?