在 Dataflow 工作流时间戳上执行 BigQuery 到 Postgre 失败

Posted

技术标签:

【中文标题】在 Dataflow 工作流时间戳上执行 BigQuery 到 Postgre 失败【英文标题】:BigQuery to Postgre execution failed on Dataflow workflow timestamp 【发布时间】:2021-03-16 17:47:16 【问题描述】:

您有这个问题,我不确定如何为我的查询获取正确的开始日期,我收到以下错误并且不确定如何修复它。请问我可以就时间转换格式寻求帮助吗?

    apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Workflow failed. Causes: S01:QueryTableStdSQL+Writing to DB/ParDo(_WriteToRelationalDBFn) failed., BigQuery execution failed., Error:
 Message: No matching signature for operator >= for argument types: TIMESTAMP, INT64. Supported signature: ANY >= ANY at [1:1241]
 HTTP Code: 400

我的脚本主查询如下所示:

with beam.Pipeline(options=options) as p:
rows = p | 'QueryTableStdSQL' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True,
    query = 'SELECT \
    billing_account_id, \
    service.id as service_id, \
    service.description as service_description, \
    sku.id as sku_id, \
    sku.description as sku_description, \
    usage_start_time, \
    usage_end_time, \
    project.id as project_id, \
    project.name as project_description, \
    TO_JSON_STRING(project.labels) \
    as project_labels, \
    project.ancestry_numbers \
    as project_ancestry_numbers, \
    TO_JSON_STRING(labels) as labels, \
    TO_JSON_STRING(system_labels) as system_labels, \
    location.location as location_location, \
    location.country as location_country, \
    location.region as location_region, \
    location.zone as location_zone, \
    export_time, \
    cost, \
    currency, \
    currency_conversion_rate, \
    usage.amount as usage_amount, \
    usage.unit as usage_unit, \
    usage.amount_in_pricing_units as \
    usage_amount_in_pricing_units, \
    usage.pricing_unit as usage_pricing_unit, \
    TO_JSON_STRING(credits) as credits, \
    invoice.month as invoice_month, \
    cost_type, \
    FROM `pprodjectID.bill_usage.gcp_billing_export_v1_xxxxxxxx` \
    WHERE export_time >= 2020-01-01'))
source_config = relational_db.SourceConfiguration(

bigquery 控制台上的日期格式

export_time
2018-01-25 01:18:55.637 UTC

usage_start_time
2018-01-24 21:23:10.643 UTC

【问题讨论】:

【参考方案1】:

您忘记将时间包含为字符串

WHERE export_time >= 2020-01-01

以上结果计算:0+2020-01-01=2018 你应该有

WHERE export_time >= "2020-01-01"

【讨论】:

以上是关于在 Dataflow 工作流时间戳上执行 BigQuery 到 Postgre 失败的主要内容,如果未能解决你的问题,请参考以下文章

如何在时间戳上使用“NOW()”更新 Sequelize?

Google Dataflow - 由GoogleSheets支持的BigQuery工作

SSIS 包写入 0 行但仅运行 DataFlow 并获得 400k 行

如何使用在 Dataflow 执行期间计算的架构写入 BigQuery?

Python Cloud Dataflow 中无法更新工作状态异常

意外行为 - TPL DataFlow BatchBlock 在 TriggerBatch 执行时拒绝项目