BigQuery 比较 DATE 和 TIMESTAMP

Posted

技术标签:

【中文标题】BigQuery 比较 DATE 和 TIMESTAMP【英文标题】:BigQuery comparing DATE and TIMESTAMP 【发布时间】:2019-01-17 15:15:25 【问题描述】:

这是我在 mysql 中使用的示例。但是,在 BigQuery 中,我的 OnSite timestampDATE,而我的 Documents 时间戳TIMESTAMP

BigQuery 在以下查询中遇到问题,因为我收到了以下消息:

对于参数类型的函数 DATE 没有匹配的签名:DATE。支持的签名:DATE(TIMESTAMP, [STRING]);日期(日期时间);日期(INT64,INT64,INT64)在 [8:146]

有谁知道我需要做些什么才能使查询与比较 DATE 和 TIMESTAMP 一起工作?

架构 (MySQL v5.7)

CREATE TABLE OnSite
    (`uid` varchar(55), `worksite_id`  varchar(55), `timestamp` datetime)
;

INSERT INTO OnSite
    (`uid`, `worksite_id`, `timestamp`)
VALUES
  ("u12345", "worksite_1", '2019-01-01'),
  ("u12345", "worksite_1", '2019-01-02'),
  ("u12345", "worksite_1", '2019-01-03'),
  ("u12345", "worksite_1", '2019-01-04'),
  ("u12345", "worksite_1", '2019-01-05'),
  ("u12345", "worksite_1", '2019-01-06'),
  ("u1", "worksite_1", '2019-01-01'),
  ("u1", "worksite_1", '2019-01-02'),
  ("u1", "worksite_1", '2019-01-05'),
  ("u1", "worksite_1", '2019-01-06')

;


CREATE TABLE Documents
    (`document_id` varchar(55), `uid` varchar(55), `worksite_id`  varchar(55), `type` varchar(55), `timestamp` datetime)
;

INSERT INTO Documents
    (`document_id`, `uid`, `worksite_id`, `type`, `timestamp`)

VALUES
  ("1",     "u12345",   "worksite_1", 'work_permit',    '2019-01-01 00:00:00'),
  ("2",     "u12345",   "worksite_2", 'job',            '2019-01-02 00:00:00'),
  ("3",     "u12345",   "worksite_1", 'work_permit',    '2019-01-03 00:00:00'),
  ("4",     "u12345",   "worksite_2", 'job',            '2019-01-04 00:00:00'),
  ("5",     "u12345",   "worksite_1", 'work_permit',    '2019-01-05 00:00:00'),
  ("6",     "u12345",   "worksite_2", 'job',            '2019-01-06 00:00:00'),
  ("7",     "u12345",   "worksite_1", 'work_permit',    '2019-01-07 00:00:00'),
  ("8",     "u12345",   "worksite_2", 'work_permit',    '2019-01-09 00:00:00'),
  ("9",     "u12345",   "worksite_1", 'job',            '2019-01-09 00:00:00'),
  ("10",    "u12345",   "worksite_2", 'work_permit',    '2019-01-09 00:00:00'),
  ("11",    "u12345",   "worksite_1", 'work_permit',    '2019-01-09 00:00:00'),
  ("12",    "u12345",   "worksite_2", 'work_permit',    '2019-01-09 00:00:00'),
  ("13",    "u12345",   "worksite_1", 'job',            '2019-01-09 00:00:00'),
  ("14",    "u12345",   "worksite_2", 'work_permit',    '2019-01-09 00:00:00'),
  ("15",    "u12345",   "worksite_1", 'work_permit',    '2019-01-09 00:00:00')

;

查询 #1

SELECT
  IFNULL(OnSite.worksite_id, Documents.worksite_id) as `Worksite`,
  DATE(IFNULL(OnSite.timestamp, Documents.timestamp)) as `Date`,
  COUNT(Documents.worksite_id) as `Users_on_Site`,
  COUNT(DISTINCT OnSite.uid) as `Completed`

FROM OnSite
  LEFT JOIN Documents ON OnSite.worksite_id = Documents.worksite_id AND DATE(OnSite.timestamp) = DATE(Documents.timestamp)
GROUP BY `Date`, `Worksite`;

| Worksite   | Date       | Users_on_Site | Completed |
| ---------- | ---------- | ------------- | --------- |
| worksite_1 | 2019-01-01 | 2             | 2         |
| worksite_1 | 2019-01-02 | 0             | 2         |
| worksite_1 | 2019-01-03 | 1             | 1         |
| worksite_1 | 2019-01-04 | 0             | 1         |
| worksite_1 | 2019-01-05 | 2             | 2         |
| worksite_1 | 2019-01-06 | 0             | 2         |

View on DB Fiddle

【问题讨论】:

如果你使用像LEFT JOIN Documents ON OnSite.worksite_id = Documents.worksite_id AND DATE(TIMESTAMP OnSite.timestamp) = DATE(TIMESTAMP Documents.timestamp)LEFT JOIN Documents ON OnSite.worksite_id = Documents.worksite_id AND DATE(OnSite.timestamp::timestamp) = DATE(Documents.timestamp::timestamp)这样的SQL类型转换呢? 你需要清楚你使用什么 - mysqlbigquery @MikhailBerlyant 抱歉,我以为我在标题中很清楚。我标记了两者,因为我觉得它们是相似的技能,但我会确保删除 mysql 标签。 我真的建议您使用正确的标签,因为这会误导并最终得到“垃圾”答案 @MikhailBerlyant 绝对同意。不会再这样做了 【参考方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
SELECT
  IFNULL(OnSite.worksite_id, Documents.worksite_id) AS `Worksite`,
  IFNULL(OnSite.timestamp, DATE(Documents.timestamp)) AS `DATE`,
  COUNT(Documents.worksite_id) AS `Users_on_Site`,
  COUNT(DISTINCT OnSite.uid) AS `Completed`
FROM `project.dataset.OnSite` OnSite
LEFT JOIN `project.dataset.Documents` Documents 
ON OnSite.worksite_id = Documents.worksite_id 
AND OnSite.timestamp = DATE(Documents.timestamp)
GROUP BY `DATE`, `Worksite`

如果适用于您问题中的样本数据

WITH `project.dataset.OnSite` AS (
  SELECT "u12345" uid, "worksite_1" worksite_id, DATE '2019-01-01' `TIMESTAMP` UNION ALL
  SELECT "u12345", "worksite_1", '2019-01-02' UNION ALL
  SELECT "u12345", "worksite_1", '2019-01-03' UNION ALL
  SELECT "u12345", "worksite_1", '2019-01-04' UNION ALL
  SELECT "u12345", "worksite_1", '2019-01-05' UNION ALL
  SELECT "u12345", "worksite_1", '2019-01-06' UNION ALL
  SELECT "u1", "worksite_1", '2019-01-01' UNION ALL
  SELECT "u1", "worksite_1", '2019-01-02' UNION ALL
  SELECT "u1", "worksite_1", '2019-01-05' UNION ALL
  SELECT "u1", "worksite_1", '2019-01-06' 
), `project.dataset.Documents` AS (
  SELECT "1" document_id,     "u12345" uid,   "worksite_1" worksite_id, 'work_permit' type,    TIMESTAMP '2019-01-01 00:00:00' `TIMESTAMP` UNION ALL
  SELECT "2",     "u12345",   "worksite_2", 'job',            '2019-01-02 00:00:00' UNION ALL
  SELECT "3",     "u12345",   "worksite_1", 'work_permit',    '2019-01-03 00:00:00' UNION ALL
  SELECT "4",     "u12345",   "worksite_2", 'job',            '2019-01-04 00:00:00' UNION ALL
  SELECT "5",     "u12345",   "worksite_1", 'work_permit',    '2019-01-05 00:00:00' UNION ALL
  SELECT "6",     "u12345",   "worksite_2", 'job',            '2019-01-06 00:00:00' UNION ALL
  SELECT "7",     "u12345",   "worksite_1", 'work_permit',    '2019-01-07 00:00:00' UNION ALL
  SELECT "8",     "u12345",   "worksite_2", 'work_permit',    '2019-01-09 00:00:00' UNION ALL
  SELECT "9",     "u12345",   "worksite_1", 'job',            '2019-01-09 00:00:00' UNION ALL
  SELECT "10",    "u12345",   "worksite_2", 'work_permit',    '2019-01-09 00:00:00' UNION ALL
  SELECT "11",    "u12345",   "worksite_1", 'work_permit',    '2019-01-09 00:00:00' UNION ALL
  SELECT "12",    "u12345",   "worksite_2", 'work_permit',    '2019-01-09 00:00:00' UNION ALL
  SELECT "13",    "u12345",   "worksite_1", 'job',            '2019-01-09 00:00:00' UNION ALL
  SELECT "14",    "u12345",   "worksite_2", 'work_permit',    '2019-01-09 00:00:00' UNION ALL
  SELECT "15",    "u12345",   "worksite_1", 'work_permit',    '2019-01-09 00:00:00' 
)

结果会如预期的那样

Row Worksite    Date        Users_on_Site   Completed    
1   worksite_1  2019-01-01  2               2    
2   worksite_1  2019-01-02  0               2    
3   worksite_1  2019-01-03  1               1    
4   worksite_1  2019-01-04  0               1    
5   worksite_1  2019-01-05  2               2    
6   worksite_1  2019-01-06  0               2    

【讨论】:

我在第 9 行遇到一个奇怪的错误:ON OnSite.worksite_id = Documentimestamp.worksite_id (无法识别的名称) 哎呀。错字 - 将很快修复 现在检查。抱歉,这是我这边的复制/粘贴问题 嗯,由于某种原因,当我将它放入 BigQuery 时,它在同一行上出现了同样的问题。 我认为你可以这样做 - WHERE OnSite.timestamp BETWEEN "2019-01-01" AND "2019-10-10" - 在这里看不到任何“hackiness” - 通常的过滤方式【参考方案2】:

在BigQuery documentation 中,说明DATE 函数接受以下输入:

    DATE(year, month, day):从代表年、月、日的 INT64 值构造一个 DATE。

    DATE(timestamp_expression[, timezone]) :将 timestamp_expression 转换为 DATE 数据类型。它支持一个可选参数来指定一个时区。如果未指定时区,则使用默认时区 UTC。

在您的用例中,您传递给DATE 的值似乎已经是一个日期时间。为此,您可以使用DATETIME_TRUNC,例如:

DATETIME_TRUNC(IFNULL(OnSite.timestamp, Documents.timestamp), DAY)

【讨论】:

【参考方案3】:

您为什么不强制转换所有内容并让生活更轻松:-)?所有这些都应该工作:

select 
   date(timestamp('2019-01-02')), 
   date(timestamp('2019-01-02 00:00:00')), 
   date(timestamp(null)))

所以,在你的 if null 语句中:

SELECT
  IFNULL(OnSite.worksite_id, Documents.worksite_id) as `Worksite`,
  IFNULL(date(datetime(OnSite.timestamp)),date(datetime(Documents.timestamp))) as `Date`,
  COUNT(Documents.worksite_id) as `Users_on_Site`,
  COUNT(DISTINCT OnSite.uid) as `Completed`
FROM OnSite
  LEFT JOIN Documents ON OnSite.worksite_id = Documents.worksite_id AND DATE(datetime(OnSite.timestamp)) = DATE(datetime(Documents.timestamp))
GROUP BY `Date`, `Worksite`;

【讨论】:

以上是关于BigQuery 比较 DATE 和 TIMESTAMP的主要内容,如果未能解决你的问题,请参考以下文章

BigQuery 中的日期比较

BigQuery - 如何比较“日期”列(使用旧版 SQL)?

未声明的查询参数@DS_START_DATE bigquery

以小时为单位的连接时间:Bigquery 的分钟

bigquery:不支持加载“DATE”、“INT64”和“FLOAT64”类型?

bigquery 数据集名称以数字和 TABLE_DATE_RANGE 开头