PostgreSQL:为所有行创建具有唯一时间戳的表
Posted
技术标签:
【中文标题】PostgreSQL:为所有行创建具有唯一时间戳的表【英文标题】:PostgreSQL: create table with unique timestamp for all rows 【发布时间】:2020-08-08 16:13:40 【问题描述】:我在如下表格中记录了用户的行程,包括开始/结束位置和时间:
CREATE TABLE trips(id integer, start_timestamp timestamp with time zone,
session_id integer, start_lat double precision,
start_lon double precision, end_lat double precision,
end_lon double precision, mode integer);
INSERT INTO trips (id, start_timestamp, session_id, start_lat,start_lon,end_lat,end_lon,mode)
VALUES (563097015,'2017-05-20 17:47:12+01', 128618, 41.1783308,-8.5949878, 41.1784478, -8.5948463, 0),
(563097013, '2017-05-20 17:45:29+01', 128618, 41.1781344, -8.5951169, 41.1782919, -8.5950689, 0),
(563097011, '2017-05-20 17:43:41+01', 128618, 41.1781196, -8.5954075, 41.1782139, -8.5950689, 0),
(563097009, '2017-05-20 17:41:48+01', 128618, 41.1782497, -8.595197, 41.1781101, -8.5954124, 0),
(563097003, '2017-05-20 17:10:29+01', 128618, 41.1832512, -8.6081606, 41.1782561, -8.5950259, 0)
第二个表是所有行程的原始 gps 跟踪记录,类似于:
CREATE TABLE gps_traces (session_id integer, seconds integer, lat double precision,
lon double precision, speed double precision);
INSERT INTO gps_traces (session_id, seconds , lat , lon , speed )
VALUES (128618,1495296443,41.1844471,-8.6065158,1.35148),
(128618,1495296444,41.1844482,-8.6065303,1.28004),
(128618,1495296445,41.1844572,-8.6065503,1.46086),
(128618,1495296446,41.1844541,-8.6065691,1.23),
(128618,1495296446,41.1844589,-8.6065861, 1.22919),
(128618,1495296447,41.1844587, -8.6066043, 1.30188),
(128618, 1495296448, 41.1844604, -8.6066261, 1.43126),
(128618, 1495296449, 41.184471, -8.6066412, 1.55003),
(128618,1495296450, 41.1844715, -8.6066572, 1.29062),
(128618,1495296450, 41.1844707, -8.6066736, 1.3618)
据此,我想创建一个新表 mytable
,其中包含 GPS 在 session_id
上加入这些表,如下所示:
CREATE TABLE mytable AS SELECT id, seconds, lat, lon, speed, mode
FROM trips t
JOIN gps_traces g
ON t.session_id=g.session_id
但是,在新表中,我想确保对于在同一行程中以相同 unix 时间戳记录两次的行,仅将其选择到我的新表中。例如在这种情况下:
SELECT * FROM mytable WHERE id = 563097003;
+-----------+------------+------------+------------+---------+------+
| id | seconds | lat | lon | speed | mode |
+-----------+------------+------------+------------+---------+------+
| 563097003 | 1495296443 | 41.1844471 | -8.6065158 | 1.35148 | 0 |
| 563097003 | 1495296444 | 41.1844482 | -8.6065303 | 1.28004 | 0 |
| 563097003 | 1495296445 | 41.1844572 | -8.6065503 | 1.46086 | 0 |
| 563097003 | 1495296446 | 41.1844541 | -8.6065691 | 1.23 | 0 |
| 563097003 | 1495296446 | 41.1844589 | -8.6065861 | 1.22919 | 0 |
| 563097003 | 1495296447 | 41.1844587 | -8.6066043 | 1.30188 | 0 |
| 563097003 | 1495296448 | 41.1844604 | -8.6066261 | 1.43126 | 0 |
| 563097003 | 1495296449 | 41.184471 | -8.6066412 | 1.55003 | 0 |
| 563097003 | 1495296450 | 41.1844715 | -8.6066572 | 1.29062 | 0 |
| 563097003 | 1495296450 | 41.1844707 | -8.6066736 | 1.3618 | 0 |
| 10 rows | | | | | |
+-----------+------------+------------+------------+---------+------+
seconds
列是 Unix 时间戳。如图所示,我们可以看到在 1495296446
和 1495296450
处具有超过 1 个唯一时间戳计数的行。我想确保对于每次旅行,将记录选择到具有唯一时间戳的新表中(因此在上述情况下,应该只将一条记录选择到新表中)。我在这个db<>fiddle 中说明了这一点。
编辑
预期输出:
+-----------+------------+------------+------------+---------+------+
| id | seconds | lat | lon | speed | mode |
+-----------+------------+------------+------------+---------+------+
| 563097003 | 1495296443 | 41.1844471 | -8.6065158 | 1.35148 | 0 |
| 563097003 | 1495296444 | 41.1844482 | -8.6065303 | 1.28004 | 0 |
| 563097003 | 1495296445 | 41.1844572 | -8.6065503 | 1.46086 | 0 |
| 563097003 | 1495296446 | 41.1844541 | -8.6065691 | 1.23 | 0 |
| 563097003 | 1495296447 | 41.1844587 | -8.6066043 | 1.30188 | 0 |
| 563097003 | 1495296448 | 41.1844604 | -8.6066261 | 1.43126 | 0 |
| 563097003 | 1495296449 | 41.184471 | -8.6066412 | 1.55003 | 0 |
| 563097003 | 1495296450 | 41.1844715 | -8.6066572 | 1.29062 | 0 |
| 8 rows | | | | | |
+-----------+------------+------------+------------+---------+------+
【问题讨论】:
"只应选择一个记录" 这里的主要问题:到底是哪一个? @Abelisto 例如在预期答案表中,选择了时间戳1495296446
和1495296450
的第一条记录。在新表中,id= 563097003
现在有 8 行,而不是 10 行。
我的意思是:那些具有重复的seconds
值的行仅相差lat/lon/speed
。您如何决定最终数据中应该包含哪一个?
有固定戈登的回答:dbfiddle.uk/…
祝你好运! ) PS:请注意,没有任何重要排序的“遇到的第一行”表示“随机”。因此,您可能需要将g.ctid desc
添加到order by
子句以获取最后插入的行。
【参考方案1】:
使用DISTINCT ON
:
CREATE TABLE mytable AS
SELECT DISTINCT ON (t.session_id, seconds) id, seconds, lat, lon, speed, mode
FROM trips t JOIN
gps_traces g
ON t.session_id = g.session_id
ORDER BY t.session_id, seconds;
注意:我希望您也将session_id
包含在新表中。
感谢@Abelisto,事实证明,对这个答案的以下修改按预期工作。
CREATE TABLE mytable AS SELECT DISTINCT ON (id, seconds)id,
seconds, lat, lon, speed, mode
FROM trips t
JOIN gps_traces g
ON t.session_id=g.session_id
ORDER BY id, seconds
Here 是一个 dbfiddle。
【讨论】:
错误:column reference "session_id" is ambiguous LINE 2: SELECT DISTINCT ON (session_id, seconds) id, seconds, la..
@super_ask 。 . .那应该是合格的。
但这仅给出两行而不是问题中的所有行,而seconds
中没有重复
奇怪,它适用于SELECT DISTINCT ON (..)
,但不适用于CREATE TABLE AS SELECT DISTINCT ON (..)
所以例如,id=563097003
只返回了两行。这个 id 应该有八行。以上是关于PostgreSQL:为所有行创建具有唯一时间戳的表的主要内容,如果未能解决你的问题,请参考以下文章
将包含 PostgreSQL 时间戳的 QString 转换为 QDateTime