在 Dataflow 工作流时间戳上执行 BigQuery 到 Postgre 失败
Posted
技术标签:
【中文标题】在 Dataflow 工作流时间戳上执行 BigQuery 到 Postgre 失败【英文标题】:BigQuery to Postgre execution failed on Dataflow workflow timestamp 【发布时间】:2021-03-16 17:47:16 【问题描述】:您有这个问题,我不确定如何为我的查询获取正确的开始日期,我收到以下错误并且不确定如何修复它。请问我可以就时间转换格式寻求帮助吗?
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Workflow failed. Causes: S01:QueryTableStdSQL+Writing to DB/ParDo(_WriteToRelationalDBFn) failed., BigQuery execution failed., Error:
Message: No matching signature for operator >= for argument types: TIMESTAMP, INT64. Supported signature: ANY >= ANY at [1:1241]
HTTP Code: 400
我的脚本主查询如下所示:
with beam.Pipeline(options=options) as p:
rows = p | 'QueryTableStdSQL' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True,
query = 'SELECT \
billing_account_id, \
service.id as service_id, \
service.description as service_description, \
sku.id as sku_id, \
sku.description as sku_description, \
usage_start_time, \
usage_end_time, \
project.id as project_id, \
project.name as project_description, \
TO_JSON_STRING(project.labels) \
as project_labels, \
project.ancestry_numbers \
as project_ancestry_numbers, \
TO_JSON_STRING(labels) as labels, \
TO_JSON_STRING(system_labels) as system_labels, \
location.location as location_location, \
location.country as location_country, \
location.region as location_region, \
location.zone as location_zone, \
export_time, \
cost, \
currency, \
currency_conversion_rate, \
usage.amount as usage_amount, \
usage.unit as usage_unit, \
usage.amount_in_pricing_units as \
usage_amount_in_pricing_units, \
usage.pricing_unit as usage_pricing_unit, \
TO_JSON_STRING(credits) as credits, \
invoice.month as invoice_month, \
cost_type, \
FROM `pprodjectID.bill_usage.gcp_billing_export_v1_xxxxxxxx` \
WHERE export_time >= 2020-01-01'))
source_config = relational_db.SourceConfiguration(
bigquery 控制台上的日期格式
export_time
2018-01-25 01:18:55.637 UTC
usage_start_time
2018-01-24 21:23:10.643 UTC
【问题讨论】:
【参考方案1】:您忘记将时间包含为字符串
WHERE export_time >= 2020-01-01
以上结果计算:0+2020-01-01=2018 你应该有
WHERE export_time >= "2020-01-01"
【讨论】:
以上是关于在 Dataflow 工作流时间戳上执行 BigQuery 到 Postgre 失败的主要内容,如果未能解决你的问题,请参考以下文章
Google Dataflow - 由GoogleSheets支持的BigQuery工作
SSIS 包写入 0 行但仅运行 DataFlow 并获得 400k 行
如何使用在 Dataflow 执行期间计算的架构写入 BigQuery?