在 clickhouse 上类似 ON CONFLICT DO NOTHING
Posted
技术标签:
【中文标题】在 clickhouse 上类似 ON CONFLICT DO NOTHING【英文标题】:Analogue of ON CONFLICT DO NOTHING at clickhouse 【发布时间】:2020-09-14 14:28:39 【问题描述】:将 Postgres 上的代码传输到 Clickhouse。每周需要更新 1 个表。在 Postgres 看起来像 (ON CONFLICT Clause):
INSERT INTO ulog
(
id,
us_id,
inst_date,
st_change_date,
prev_st,
st,
update_date
)
select id
, us_id
, inst_date
, case
when st= 1 then inst_date
when st= 2 then active_date
when st= 0 then dor_date
when st= -1 then ch_date
when st= 3 then ret_date
else today()
end as st_change_date
, prev_st
, st
, now()
from user_st
where coalesce(st, -99) != coalesce(prev_st, -99)
ON CONFLICT (
id,
us_id,
st_change_date
)
DO NOTHING
;
如何在 ClickHouse 上重写此查询?
我尝试了一些单独的查询,例如“st”和“st_prev”的不同组合
INSERT INTO ulog
( id, us_id, inst_date, st_change_date, prev_st, st, update_date )
select id
, us_id
, inst_date
, case
when st= 1 then inst_date
when st= 2 then active_date
when st= 0 then dor_date
when st= -1 then ch_date
when st= 3 then ret_date
else today() end as st_change_date
, prev_st
, st
, today() from user_st where status=1 and prev_st=1
【问题讨论】:
你试过什么? (作为起点。) 添加问题第一步 【参考方案1】:CH 不支持ON CONFLICT
并且永远不会支持。这违反了 CH (OLAP) 数据库性质。 CH 只是追加行(写入一个新部分)并且无法通过键检查行(它会将插入速度减慢到 100000 次)。
INSERT INTO ulog
(
id,
us_id,
inst_date,
st_change_date,
prev_st,
st,
update_date
)
select id
, us_id
, inst_date
, case
when st= 1 then inst_date
when st= 2 then active_date
when st= 0 then dor_date
when st= -1 then ch_date
when st= 3 then ret_date
else today()
end as st_change_date
, prev_st
, st
, now()
from user_st
where coalesce(st, -99) != coalesce(prev_st, -99)
and (id, us_id,st_change_date) not in (select id, us_id,st_change_date from ulog);
【讨论】:
【参考方案2】:考虑使用ReplacingMergeTree-engine 来忽略具有相同键的后续行。
考虑到这种方式并不能保证没有重复。
CREATE TABLE IF NOT EXISTS ulog
(
st_change_date DateTime,
id Int32,
us_id Int32,
inst_date DateTime,
prev_st Int32,
st Int32,
update_date DateTime,
version UInt32 MATERIALIZED toUInt32(now() - toDateTime('2105-12-31 23:59:59'))
)
Engine = ReplacingMergeTree(version)
PARTITION BY toYYYYMM(st_change_date)
ORDER BY (st_change_date, id, us_id)
INSERT INTO ulog VALUES
('2020-09-01 10:00:00', 111, 345, now(), 11, 12, now()),
('2020-09-01 10:00:00', 222, 345, now(), 11, 12, now()),
('2020-09-01 10:00:01', 222, 345, now(), 11, 12, now()),
('2020-09-01 10:00:00', 333, 345, now(), 11, 12, now()),
('2020-09-01 10:00:01', 333, 345, now(), 11, 12, now());
INSERT INTO ulog VALUES
('2020-09-01 10:00:00', 111, 345, now(), 22, 33, now());
INSERT INTO ulog VALUES
('2020-09-01 10:00:00', 111, 345, now(), 33, 44, now());
SELECT *, version FROM ulog
/*
┌──────st_change_date─┬──id─┬─us_id─┬───────────inst_date─┬─prev_st─┬─st─┬─────────update_date─┬────version─┐
│ 2020-09-01 10:00:00 │ 111 │ 345 │ 2020-09-14 18:43:55 │ 11 │ 12 │ 2020-09-14 18:43:55 │ 1603329132 │
│ 2020-09-01 10:00:00 │ 222 │ 345 │ 2020-09-14 18:43:55 │ 11 │ 12 │ 2020-09-14 18:43:55 │ 1603329132 │
│ 2020-09-01 10:00:00 │ 333 │ 345 │ 2020-09-14 18:43:55 │ 11 │ 12 │ 2020-09-14 18:43:55 │ 1603329132 │
│ 2020-09-01 10:00:01 │ 222 │ 345 │ 2020-09-14 18:43:55 │ 11 │ 12 │ 2020-09-14 18:43:55 │ 1603329132 │
│ 2020-09-01 10:00:01 │ 333 │ 345 │ 2020-09-14 18:43:55 │ 11 │ 12 │ 2020-09-14 18:43:55 │ 1603329132 │
└─────────────────────┴─────┴───────┴─────────────────────┴─────────┴────┴─────────────────────┴────────────┘
*/
【讨论】:
以上是关于在 clickhouse 上类似 ON CONFLICT DO NOTHING的主要内容,如果未能解决你的问题,请参考以下文章