使用 Big Query API 将数据提取到按时间分区的表中,但出现 SyntaxError: Unexpected end of input
Posted
技术标签:
【中文标题】使用 Big Query API 将数据提取到按时间分区的表中,但出现 SyntaxError: Unexpected end of input【英文标题】:Using Big Query API to ingest data into table partitioned by time but getting SyntaxError: Unexpected end of input 【发布时间】:2021-01-30 21:21:08 【问题描述】:我正在尝试将 CSV 文件加载到按月分区的 Bigquery 表中。
代码返回以下错误:
google.api_core.exceptions.BadRequest: 400 Syntax error: Expected end of input but got ":" at [17:24]
语法错误似乎是指一个冒号,它是我试图插入表中的 URL 字符串的一部分: https**:**//www.example.com
鉴于它只是字符串的一部分,这似乎会提示错误。
我需要以某种方式转义冒号吗?如果有,怎么做?
我的代码是:
import pandas as pd
import pandas_gbq
from google.oauth2 import service_account
from google.cloud import storage
from google.cloud import bigquery
from datetime import datetime
query =
'''
INSERT INTO
<<project id>>.<<Dataset>>.<<table>>(_PARTITIONTIME,
url,
title,
h1
)
SELECT ,,,
'''
now = datetime.now().strftime('%Y-%m-%d')
def run():
client = \
storage.Client.from_service_account_json('<<path to file>>'
)
bq_client = \
bigquery.Client.from_service_account_json('<<path to file>>'
)
bucket = client.bucket('<<bucket name>>')
blobs = bucket.list_blobs()
list_temp_raw = []
for file in blobs:
filename = file.name
temp = pd.read_csv('gs://<<bucket name>>/' + filename)
list_temp_raw.append(temp)
df = pd.concat(list_temp_raw)
df = df[cols]
for i in range(len(df.head())):
**load_query = query.format(
now,
df.loc[i, 'url'],
df.iloc[i, 'title'],
df.loc[i, 'h1']
)
query_job = bq_client.query(load_query)**
query_job.result()
run()
【问题讨论】:
尝试添加括号,如SELECT ,,,""
。附:最好问一个新问题,因为旧答案现在不相关了。这让其他人感到困惑。
【参考方案1】:
不确定...但也许将 TIMESTAMP 添加到查询或等待作业完成可能会有所帮助:
import pandas as pd
import pandas_gbq
from google.oauth2 import service_account
from google.cloud import storage
from google.cloud import bigquery
from datetime import datetime
query =
'''
INSERT INTO
<<project id>>.<<Dataset>>.<<table>>(_PARTITIONTIME,
a,
b,
c,
d,
)
SELECT TIMESTAMP(""),,,,
'''
now = datetime.now().strftime('%Y-%m-%d')
def run():
client = \
storage.Client.from_service_account_json('<<path to file>>'
)
bq_client = \
bigquery.Client.from_service_account_json('<<path to file>>'
)
bucket = client.bucket('<<bucket name>>')
blobs = bucket.list_blobs()
list_temp_raw = []
for file in blobs:
filename = file.name
temp = pd.read_csv('gs://<<bucket name>>/' + filename)
list_temp_raw.append(temp)
df = pd.concat(list_temp_raw)
df = df[cols]
for i in range(len(df.head())):
load_query = query.format(
now,
df.loc[i, 'a'],
df.iloc[i, 'b'],
df.loc[i, 'c'],
df.loc[i, 'd']
)
query_job = bq_client.query(load_query)
query_job.result() # Wait for the job to complete.
run()
【讨论】:
以上是关于使用 Big Query API 将数据提取到按时间分区的表中,但出现 SyntaxError: Unexpected end of input的主要内容,如果未能解决你的问题,请参考以下文章
第二次使用 Dataflow 从 Big Query 中提取数据的问题 [apache beam]