使用 Big Query API 将数据提取到按时间分区的表中，但出现 SyntaxError: Unexpected end of input

Posted 2023-03-25

技术标签:

【中文标题】使用 Big Query API 将数据提取到按时间分区的表中，但出现 SyntaxError: Unexpected end of input【英文标题】：Using Big Query API to ingest data into table partitioned by time but getting SyntaxError: Unexpected end of input 【发布时间】：2021-01-30 21:21:08 【问题描述】：

我正在尝试将 CSV 文件加载到按月分区的 Bigquery 表中。

代码返回以下错误：

google.api_core.exceptions.BadRequest: 400 Syntax error: Expected end of input but got ":" at [17:24]

语法错误似乎是指一个冒号，它是我试图插入表中的 URL 字符串的一部分： https**:**//www.example.com

鉴于它只是字符串的一部分，这似乎会提示错误。

我需要以某种方式转义冒号吗？如果有，怎么做？

我的代码是：

import pandas as pd
import pandas_gbq
from google.oauth2 import service_account
from google.cloud import storage
from google.cloud import bigquery
from datetime import datetime

query =
    '''
INSERT INTO
<<project id>>.<<Dataset>>.<<table>>(_PARTITIONTIME,
url,
title,
h1
)
SELECT ,,,
'''
now = datetime.now().strftime('%Y-%m-%d')


def run():

    client = \
        storage.Client.from_service_account_json('<<path to file>>'
            )
    bq_client = \
        bigquery.Client.from_service_account_json('<<path to file>>'
            )
    bucket = client.bucket('<<bucket name>>')
    blobs = bucket.list_blobs()
    list_temp_raw = []
    for file in blobs:
        filename = file.name
        temp = pd.read_csv('gs://<<bucket name>>/' + filename)
        list_temp_raw.append(temp)
    df = pd.concat(list_temp_raw)
    df = df[cols]
    for i in range(len(df.head())):
        **load_query = query.format(
            now,
            df.loc[i, 'url'],
            df.iloc[i, 'title'],
            df.loc[i, 'h1']
            )
        query_job = bq_client.query(load_query)**
        query_job.result()
run()

【问题讨论】：

尝试添加括号，如SELECT ,,,""。附：最好问一个新问题，因为旧答案现在不相关了。这让其他人感到困惑。 【参考方案1】：

不确定...但也许将 TIMESTAMP 添加到查询或等待作业完成可能会有所帮助：

import pandas as pd
import pandas_gbq
from google.oauth2 import service_account
from google.cloud import storage
from google.cloud import bigquery
from datetime import datetime

query =
    '''
INSERT INTO
<<project id>>.<<Dataset>>.<<table>>(_PARTITIONTIME,
a,
b,
c,
d,
)
SELECT TIMESTAMP(""),,,,
'''
now = datetime.now().strftime('%Y-%m-%d')


def run():

    client = \
        storage.Client.from_service_account_json('<<path to file>>'
            )
    bq_client = \
        bigquery.Client.from_service_account_json('<<path to file>>'
            )
    bucket = client.bucket('<<bucket name>>')
    blobs = bucket.list_blobs()
    list_temp_raw = []
    for file in blobs:
        filename = file.name
        temp = pd.read_csv('gs://<<bucket name>>/' + filename)
        list_temp_raw.append(temp)
    df = pd.concat(list_temp_raw)
    df = df[cols]
    for i in range(len(df.head())):
        load_query = query.format(
            now,
            df.loc[i, 'a'],
            df.iloc[i, 'b'],
            df.loc[i, 'c'],
            df.loc[i, 'd']
            )
        query_job = bq_client.query(load_query)
        query_job.result()  # Wait for the job to complete.
run()

【讨论】：

以上是关于使用 Big Query API 将数据提取到按时间分区的表中，但出现 SyntaxError: Unexpected end of input的主要内容，如果未能解决你的问题，请参考以下文章