如何用 postgresql 计算排列?

Posted

技术标签:

【中文标题】如何用 postgresql 计算排列?【英文标题】:How to compute permutations with postgresql? 【发布时间】:2016-04-14 11:23:38 【问题描述】:

我有一个包含城市之间连接的大型数据库。每个连接都有一个开始和目的地城镇、一个开始日期以及该连接的价格。

我想计算传出+返回连接的任何组合、任何连接以及返回连接在 1-20 天之间的日期。然后为每个日期组合选择最优惠的价格。

例子:

表:

city_start,     city_end,   date_start,     price
Hamburg         Berlin      01.01.2016      100.00
Berlin          Hamburg     10.01.2016      112.00
Berlin          Hamburg     10.01.2016      70.00
Berlin          Hamburg     12.01.2016      50.00
Berlin          Hamburg     30.02.2016      20.00
Paris           Madrid      ...
Madrid          Paris
London          Paris

想要的结果:

Hamburg-Berlin-Hamburg, 01.01.2016, 10.01.2016, 170.00 (100+70)
Hamburg-Berlin-Hamburg, 01.01.2016, 12.01.2016, 150.00 (100+50)
...
(not Berlin-Hamburg on 30.02.2016 because it's >20 days from departure drive)
(not London-Paris, as there is no return Paris-London)

我可以通过以下方式获得可能的组合:

SELECT DISTINCT city_start, city_end, city_end, city_start from table

但是我现在如何计算它们的排列呢?

【问题讨论】:

【参考方案1】:

获取所有对的查询使用join

select tto.city_start, tto.city_end, tto.date_start, tfrom.date_end,
       (tto.price + tfrom.price) as price
from t tto join
     t tfrom
     on tto.city_end = tfrom.city_start and
        tto.city_start = tfrom.city_end and
        tfrom.date_start >= tto.date_start + interval '1 day' and
        tfrom.date_end <= tto.date_start + interval '20 day';

要获得最便宜的价格,请使用窗口函数:

select tt.*
from (select tto.city_start, tto.city_end, tto.date_start, tfrom.date_end,
             (tto.price + tfrom.price) as price,
             row_number() over (partition by tto.city_start, tto.city_end order by (tto.price + tfrom.price) asc) as seqnum
      from t tto join
           t tfrom
           on tto.city_end = tfrom.city_start and
              tto.city_start = tfrom.city_end and
              tfrom.date_start >= tto.date_start + interval '1 day' and
              tfrom.date_end <= tto.date_start + interval '20 day'
      ) tt
where seqnum = 1;

【讨论】:

太好了,这似乎可以正常工作。 row_number 分区部分是否有替代方案? (因为windowAggr函数性能很差)?【参考方案2】:

这是一个没有row_number分区部分的解决方案:

SELECT
    a.city_start, a.city_end, b.city_end, a.date_start, b.date_start,
    min(a.price + b.price)
FROM
    flight AS a
    JOIN
    flight AS b ON a.city_start = b.city_end AND a.city_end = b.city_start
WHERE b.date_start BETWEEN a.date_start + 1 AND a.date_start + 20
GROUP BY a.city_start, a.city_end, b.city_end, a.date_start, b.date_start;

【讨论】:

也非常感谢。一个问题:如果我想选择group by之外的属性怎么办?例如a.carName? 我不确定,因为我没有在数据中看到 carName。我需要在其余数据的背景下才能回答这个问题...... 如果问题只是一个例子。假设每一行都有一个额外的列carName。问题是:如果不在 group by 中,我该如何选择这个值 去程或回程选择车名? 两者皆宜。【参考方案3】:

如果您想包含其他列,请尝试以下操作:

SELECT
    a.city_start, a.city_end, b.city_end, a.date_start, b.date_start,
    a.price + b.price, a.car_name, b.car_name
FROM
    flight AS a
    JOIN
    flight AS b ON a.city_start = b.city_end AND a.city_end = b.city_start
    LEFT JOIN
    flight AS c ON
         a.city_start = c.city_start
         AND
         a.city_end = c.city_end
         AND
         a.date_start = c.date_start
         AND (
             a.price > c.price
             OR (
                 a.price = c.price
                 AND
                 a.id > c.id))
    LEFT JOIN
    flight AS d ON
         b.city_start = d.city_start
         AND
         b.city_end = d.city_end
         AND
         b.date_start = d.date_start
         AND (
             b.price > d.price
             OR (
                 b.price = d.price
                 AND
                 b.id > d.id))
WHERE
    b.date_start BETWEEN a.date_start + 1 AND a.date_start + 20
    AND
    c.id IS NULL
    AND
    d.id IS NULL;

【讨论】:

以上是关于如何用 postgresql 计算排列?的主要内容,如果未能解决你的问题,请参考以下文章

如何用逗号分割行并在 PostgreSQL 中取消透视表?

2 如何用Python进行数据计算

如何用命令打开postgresql数据库

Postgresql:如何用n步对组中的记录进行排序

如何用python将行排列成csv文件的列?

如何用jQuery实现单双数排列?(实用篇)