BigQuery 比较 DATE 和 TIMESTAMP
Posted
技术标签:
【中文标题】BigQuery 比较 DATE 和 TIMESTAMP【英文标题】:BigQuery comparing DATE and TIMESTAMP 【发布时间】:2019-01-17 15:15:25 【问题描述】:这是我在 mysql 中使用的示例。但是,在 BigQuery 中,我的 OnSite timestamp
是 DATE,而我的 Documents 时间戳是 TIMESTAMP。
BigQuery 在以下查询中遇到问题,因为我收到了以下消息:
对于参数类型的函数 DATE 没有匹配的签名:DATE。支持的签名:DATE(TIMESTAMP, [STRING]);日期(日期时间);日期(INT64,INT64,INT64)在 [8:146]
有谁知道我需要做些什么才能使查询与比较 DATE 和 TIMESTAMP 一起工作?
架构 (MySQL v5.7)
CREATE TABLE OnSite
(`uid` varchar(55), `worksite_id` varchar(55), `timestamp` datetime)
;
INSERT INTO OnSite
(`uid`, `worksite_id`, `timestamp`)
VALUES
("u12345", "worksite_1", '2019-01-01'),
("u12345", "worksite_1", '2019-01-02'),
("u12345", "worksite_1", '2019-01-03'),
("u12345", "worksite_1", '2019-01-04'),
("u12345", "worksite_1", '2019-01-05'),
("u12345", "worksite_1", '2019-01-06'),
("u1", "worksite_1", '2019-01-01'),
("u1", "worksite_1", '2019-01-02'),
("u1", "worksite_1", '2019-01-05'),
("u1", "worksite_1", '2019-01-06')
;
CREATE TABLE Documents
(`document_id` varchar(55), `uid` varchar(55), `worksite_id` varchar(55), `type` varchar(55), `timestamp` datetime)
;
INSERT INTO Documents
(`document_id`, `uid`, `worksite_id`, `type`, `timestamp`)
VALUES
("1", "u12345", "worksite_1", 'work_permit', '2019-01-01 00:00:00'),
("2", "u12345", "worksite_2", 'job', '2019-01-02 00:00:00'),
("3", "u12345", "worksite_1", 'work_permit', '2019-01-03 00:00:00'),
("4", "u12345", "worksite_2", 'job', '2019-01-04 00:00:00'),
("5", "u12345", "worksite_1", 'work_permit', '2019-01-05 00:00:00'),
("6", "u12345", "worksite_2", 'job', '2019-01-06 00:00:00'),
("7", "u12345", "worksite_1", 'work_permit', '2019-01-07 00:00:00'),
("8", "u12345", "worksite_2", 'work_permit', '2019-01-09 00:00:00'),
("9", "u12345", "worksite_1", 'job', '2019-01-09 00:00:00'),
("10", "u12345", "worksite_2", 'work_permit', '2019-01-09 00:00:00'),
("11", "u12345", "worksite_1", 'work_permit', '2019-01-09 00:00:00'),
("12", "u12345", "worksite_2", 'work_permit', '2019-01-09 00:00:00'),
("13", "u12345", "worksite_1", 'job', '2019-01-09 00:00:00'),
("14", "u12345", "worksite_2", 'work_permit', '2019-01-09 00:00:00'),
("15", "u12345", "worksite_1", 'work_permit', '2019-01-09 00:00:00')
;
查询 #1
SELECT
IFNULL(OnSite.worksite_id, Documents.worksite_id) as `Worksite`,
DATE(IFNULL(OnSite.timestamp, Documents.timestamp)) as `Date`,
COUNT(Documents.worksite_id) as `Users_on_Site`,
COUNT(DISTINCT OnSite.uid) as `Completed`
FROM OnSite
LEFT JOIN Documents ON OnSite.worksite_id = Documents.worksite_id AND DATE(OnSite.timestamp) = DATE(Documents.timestamp)
GROUP BY `Date`, `Worksite`;
| Worksite | Date | Users_on_Site | Completed |
| ---------- | ---------- | ------------- | --------- |
| worksite_1 | 2019-01-01 | 2 | 2 |
| worksite_1 | 2019-01-02 | 0 | 2 |
| worksite_1 | 2019-01-03 | 1 | 1 |
| worksite_1 | 2019-01-04 | 0 | 1 |
| worksite_1 | 2019-01-05 | 2 | 2 |
| worksite_1 | 2019-01-06 | 0 | 2 |
View on DB Fiddle
【问题讨论】:
如果你使用像LEFT JOIN Documents ON OnSite.worksite_id = Documents.worksite_id AND DATE(TIMESTAMP OnSite.timestamp) = DATE(TIMESTAMP Documents.timestamp)
或LEFT JOIN Documents ON OnSite.worksite_id = Documents.worksite_id AND DATE(OnSite.timestamp::timestamp) = DATE(Documents.timestamp::timestamp)
这样的SQL类型转换呢?
你需要清楚你使用什么 - mysql
或 bigquery
@MikhailBerlyant 抱歉,我以为我在标题中很清楚。我标记了两者,因为我觉得它们是相似的技能,但我会确保删除 mysql 标签。
我真的建议您使用正确的标签,因为这会误导并最终得到“垃圾”答案
@MikhailBerlyant 绝对同意。不会再这样做了
【参考方案1】:
以下是 BigQuery 标准 SQL
#standardSQL
SELECT
IFNULL(OnSite.worksite_id, Documents.worksite_id) AS `Worksite`,
IFNULL(OnSite.timestamp, DATE(Documents.timestamp)) AS `DATE`,
COUNT(Documents.worksite_id) AS `Users_on_Site`,
COUNT(DISTINCT OnSite.uid) AS `Completed`
FROM `project.dataset.OnSite` OnSite
LEFT JOIN `project.dataset.Documents` Documents
ON OnSite.worksite_id = Documents.worksite_id
AND OnSite.timestamp = DATE(Documents.timestamp)
GROUP BY `DATE`, `Worksite`
如果适用于您问题中的样本数据
WITH `project.dataset.OnSite` AS (
SELECT "u12345" uid, "worksite_1" worksite_id, DATE '2019-01-01' `TIMESTAMP` UNION ALL
SELECT "u12345", "worksite_1", '2019-01-02' UNION ALL
SELECT "u12345", "worksite_1", '2019-01-03' UNION ALL
SELECT "u12345", "worksite_1", '2019-01-04' UNION ALL
SELECT "u12345", "worksite_1", '2019-01-05' UNION ALL
SELECT "u12345", "worksite_1", '2019-01-06' UNION ALL
SELECT "u1", "worksite_1", '2019-01-01' UNION ALL
SELECT "u1", "worksite_1", '2019-01-02' UNION ALL
SELECT "u1", "worksite_1", '2019-01-05' UNION ALL
SELECT "u1", "worksite_1", '2019-01-06'
), `project.dataset.Documents` AS (
SELECT "1" document_id, "u12345" uid, "worksite_1" worksite_id, 'work_permit' type, TIMESTAMP '2019-01-01 00:00:00' `TIMESTAMP` UNION ALL
SELECT "2", "u12345", "worksite_2", 'job', '2019-01-02 00:00:00' UNION ALL
SELECT "3", "u12345", "worksite_1", 'work_permit', '2019-01-03 00:00:00' UNION ALL
SELECT "4", "u12345", "worksite_2", 'job', '2019-01-04 00:00:00' UNION ALL
SELECT "5", "u12345", "worksite_1", 'work_permit', '2019-01-05 00:00:00' UNION ALL
SELECT "6", "u12345", "worksite_2", 'job', '2019-01-06 00:00:00' UNION ALL
SELECT "7", "u12345", "worksite_1", 'work_permit', '2019-01-07 00:00:00' UNION ALL
SELECT "8", "u12345", "worksite_2", 'work_permit', '2019-01-09 00:00:00' UNION ALL
SELECT "9", "u12345", "worksite_1", 'job', '2019-01-09 00:00:00' UNION ALL
SELECT "10", "u12345", "worksite_2", 'work_permit', '2019-01-09 00:00:00' UNION ALL
SELECT "11", "u12345", "worksite_1", 'work_permit', '2019-01-09 00:00:00' UNION ALL
SELECT "12", "u12345", "worksite_2", 'work_permit', '2019-01-09 00:00:00' UNION ALL
SELECT "13", "u12345", "worksite_1", 'job', '2019-01-09 00:00:00' UNION ALL
SELECT "14", "u12345", "worksite_2", 'work_permit', '2019-01-09 00:00:00' UNION ALL
SELECT "15", "u12345", "worksite_1", 'work_permit', '2019-01-09 00:00:00'
)
结果会如预期的那样
Row Worksite Date Users_on_Site Completed
1 worksite_1 2019-01-01 2 2
2 worksite_1 2019-01-02 0 2
3 worksite_1 2019-01-03 1 1
4 worksite_1 2019-01-04 0 1
5 worksite_1 2019-01-05 2 2
6 worksite_1 2019-01-06 0 2
【讨论】:
我在第 9 行遇到一个奇怪的错误:ON OnSite.worksite_id = Documentimestamp.worksite_id
(无法识别的名称)
哎呀。错字 - 将很快修复
现在检查。抱歉,这是我这边的复制/粘贴问题
嗯,由于某种原因,当我将它放入 BigQuery 时,它在同一行上出现了同样的问题。
我认为你可以这样做 - WHERE OnSite.timestamp BETWEEN "2019-01-01" AND "2019-10-10"
- 在这里看不到任何“hackiness” - 通常的过滤方式【参考方案2】:
在BigQuery documentation 中,说明DATE
函数接受以下输入:
DATE(year, month, day)
:从代表年、月、日的 INT64 值构造一个 DATE。
DATE(timestamp_expression[, timezone])
:将 timestamp_expression 转换为 DATE 数据类型。它支持一个可选参数来指定一个时区。如果未指定时区,则使用默认时区 UTC。
在您的用例中,您传递给DATE
的值似乎已经是一个日期时间。为此,您可以使用DATETIME_TRUNC
,例如:
DATETIME_TRUNC(IFNULL(OnSite.timestamp, Documents.timestamp), DAY)
【讨论】:
【参考方案3】:您为什么不强制转换所有内容并让生活更轻松:-)?所有这些都应该工作:
select
date(timestamp('2019-01-02')),
date(timestamp('2019-01-02 00:00:00')),
date(timestamp(null)))
所以,在你的 if null 语句中:
SELECT
IFNULL(OnSite.worksite_id, Documents.worksite_id) as `Worksite`,
IFNULL(date(datetime(OnSite.timestamp)),date(datetime(Documents.timestamp))) as `Date`,
COUNT(Documents.worksite_id) as `Users_on_Site`,
COUNT(DISTINCT OnSite.uid) as `Completed`
FROM OnSite
LEFT JOIN Documents ON OnSite.worksite_id = Documents.worksite_id AND DATE(datetime(OnSite.timestamp)) = DATE(datetime(Documents.timestamp))
GROUP BY `Date`, `Worksite`;
【讨论】:
以上是关于BigQuery 比较 DATE 和 TIMESTAMP的主要内容,如果未能解决你的问题,请参考以下文章
BigQuery - 如何比较“日期”列(使用旧版 SQL)?
未声明的查询参数@DS_START_DATE bigquery