如何用 postgresql 计算排列?
Posted
技术标签:
【中文标题】如何用 postgresql 计算排列?【英文标题】:How to compute permutations with postgresql? 【发布时间】:2016-04-14 11:23:38 【问题描述】:我有一个包含城市之间连接的大型数据库。每个连接都有一个开始和目的地城镇、一个开始日期以及该连接的价格。
我想计算传出+返回连接的任何组合、任何连接以及返回连接在 1-20 天之间的日期。然后为每个日期组合选择最优惠的价格。
例子:
表:
city_start, city_end, date_start, price
Hamburg Berlin 01.01.2016 100.00
Berlin Hamburg 10.01.2016 112.00
Berlin Hamburg 10.01.2016 70.00
Berlin Hamburg 12.01.2016 50.00
Berlin Hamburg 30.02.2016 20.00
Paris Madrid ...
Madrid Paris
London Paris
想要的结果:
Hamburg-Berlin-Hamburg, 01.01.2016, 10.01.2016, 170.00 (100+70)
Hamburg-Berlin-Hamburg, 01.01.2016, 12.01.2016, 150.00 (100+50)
...
(not Berlin-Hamburg on 30.02.2016 because it's >20 days from departure drive)
(not London-Paris, as there is no return Paris-London)
我可以通过以下方式获得可能的组合:
SELECT DISTINCT city_start, city_end, city_end, city_start from table
但是我现在如何计算它们的排列呢?
【问题讨论】:
【参考方案1】:获取所有对的查询使用join
:
select tto.city_start, tto.city_end, tto.date_start, tfrom.date_end,
(tto.price + tfrom.price) as price
from t tto join
t tfrom
on tto.city_end = tfrom.city_start and
tto.city_start = tfrom.city_end and
tfrom.date_start >= tto.date_start + interval '1 day' and
tfrom.date_end <= tto.date_start + interval '20 day';
要获得最便宜的价格,请使用窗口函数:
select tt.*
from (select tto.city_start, tto.city_end, tto.date_start, tfrom.date_end,
(tto.price + tfrom.price) as price,
row_number() over (partition by tto.city_start, tto.city_end order by (tto.price + tfrom.price) asc) as seqnum
from t tto join
t tfrom
on tto.city_end = tfrom.city_start and
tto.city_start = tfrom.city_end and
tfrom.date_start >= tto.date_start + interval '1 day' and
tfrom.date_end <= tto.date_start + interval '20 day'
) tt
where seqnum = 1;
【讨论】:
太好了,这似乎可以正常工作。 row_number 分区部分是否有替代方案? (因为windowAggr函数性能很差)?【参考方案2】:这是一个没有row_number分区部分的解决方案:
SELECT
a.city_start, a.city_end, b.city_end, a.date_start, b.date_start,
min(a.price + b.price)
FROM
flight AS a
JOIN
flight AS b ON a.city_start = b.city_end AND a.city_end = b.city_start
WHERE b.date_start BETWEEN a.date_start + 1 AND a.date_start + 20
GROUP BY a.city_start, a.city_end, b.city_end, a.date_start, b.date_start;
【讨论】:
也非常感谢。一个问题:如果我想选择group by
之外的属性怎么办?例如a.carName
?
我不确定,因为我没有在数据中看到 carName。我需要在其余数据的背景下才能回答这个问题......
如果问题只是一个例子。假设每一行都有一个额外的列carName
。问题是:如果不在 group by 中,我该如何选择这个值
去程或回程选择车名?
两者皆宜。【参考方案3】:
如果您想包含其他列,请尝试以下操作:
SELECT
a.city_start, a.city_end, b.city_end, a.date_start, b.date_start,
a.price + b.price, a.car_name, b.car_name
FROM
flight AS a
JOIN
flight AS b ON a.city_start = b.city_end AND a.city_end = b.city_start
LEFT JOIN
flight AS c ON
a.city_start = c.city_start
AND
a.city_end = c.city_end
AND
a.date_start = c.date_start
AND (
a.price > c.price
OR (
a.price = c.price
AND
a.id > c.id))
LEFT JOIN
flight AS d ON
b.city_start = d.city_start
AND
b.city_end = d.city_end
AND
b.date_start = d.date_start
AND (
b.price > d.price
OR (
b.price = d.price
AND
b.id > d.id))
WHERE
b.date_start BETWEEN a.date_start + 1 AND a.date_start + 20
AND
c.id IS NULL
AND
d.id IS NULL;
【讨论】:
以上是关于如何用 postgresql 计算排列?的主要内容,如果未能解决你的问题,请参考以下文章