每个组的最大 n 条件并加入大表(或在汇率表中出现漏洞时查询以本国货币表示的金额)

Posted

技术标签:

【中文标题】每个组的最大 n 条件并加入大表(或在汇率表中出现漏洞时查询以本国货币表示的金额)【英文标题】:greatest-n-per-group with condition and joining to large table (or query for getting amount in national currency when holes in currency rates table) 【发布时间】:2014-08-06 20:06:56 【问题描述】:

我阅读了https://***.com/questions/tagged/greatest-n-per-group 标签上最相关的问答,但由于细节不同,我没有找到适合我的任务的解决方案。

我有一个带有 amount/currency/date 的表格,并且有一个任务将 amount 转换为 该日期的国家等值金额。

货币汇率表有漏洞的一个问题是金额/货币/日期直接联合给出null。根据经验 - 在这种情况下,业务规则规定您可以获得给定金额/货币的最后可用汇率。

我的愚蠢解决方案:

select p.AMOUNT * cr.RATE from PAYMENT p
  join CURRENCY_RATE cr on cr.CURRENCY = p.CURRENCY
    and cr.DATE = (select max(subcr.DATE) from CURRENCY_RATE subcr
                     where subcr.CURRENCY = cr.CURRENCY and subcr.DATE <= p.DATE)

给出非常糟糕的执行计划(这是简化的查询,由于额外的业务逻辑,原来有很多全表扫描散列连接)。

查询大量PAYMENT,通过全扫描访问的表。

CURRENCY_RATE 查询了许多CURRENCY/DATE 对。我不太确定使用对上的索引作为索引范围扫描中的第一个将是检索对的好策略...

我使用 Oracle,但不明白窗口函数是否适用于 max(...) over (partition by ...) 还必须有附加条件时的那种情况......

UPDATE 我打算使用查询来进行数据迁移和导入任务,所以PAYMENT 上确实没有过滤器。我开始认为我可以使用 p.AMOUNT * cr.RATE 导入,如果它 null 然后使用上述查询更新不完整的记录。这看起来很有希望,因为CURRENCY_RATE 中很少出现漏洞。

我看到的另一种解决方案 - 使用物化视图或另一个没有孔的表。

【问题讨论】:

【参考方案1】:

您可以尝试这样的查询:

SELECT
    A.AMOUNT * A.RATE
FROM
    (
        SELECT
            P.AMOUNT,
            CR.RATE,
            ROW_NUMBER() OVER (PARTITION BY P.ROWID ORDER BY CR.DATE DESC) AS RN
        FROM
            PAYMENT P
        INNER JOIN CURRENCY_RATE CR
        ON
            P.CURRENCY = CR.CURRENCY
            AND
            P.DATE >= CR.DATE
    ) A
WHERE
    A.RN = 1

以下几点需要注意:

    使用 DATECURRENCY 等保留字可能会导致名称解析发生冲突。 查询将从PAYMENT 中排除在CURRENCY_RATE 中没有匹配行的行。如果要包含此类行,请使用 LEFT JOIN 而不是 INNER JOIN。 如果CURRENCYDATECURRENCY RATE 中的组合不唯一,则查询将任意选择其中一行。如果您想在这种情况下选择特定行,请根据需要在 ORDER BY 子句中添加表达式,以便您想要的行出现在第一位。 如果PAYMENT 具有唯一的非空键,您可以在PARTITION BY 子句中使用它来代替P.ROWID

【讨论】:

【参考方案2】:

根据您的 SQL,我了解到您正在尝试显示 PAYMENT 表中的所有记录,因为它没有任何过滤器。尝试使用货币作为过滤器。 如果您确实需要在没有过滤器的情况下显示所有记录,那么如果在 INSERT 本身时为 NULL,则必须更新 rate.payment 列。 然后为了显示你可以单独使用"Select p.AMOUNT * cr.RATE from PAYMENT p"

【讨论】:

【参考方案3】:

“max”和“group py”的最新货币:

select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
  where cr.dt = (select max(subcr.DT) from CURRENCY_RATE subcr where subcr.COUNTRY = cr.COUNTRY and subcr.CURRENCY = cr.CURRENCY group by subcr.CURRENCY, subcr.COUNTRY);

-- too long
select count (*) from (
  select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
    where cr.dt = (select max(subcr.DT) from CURRENCY_RATE subcr where subcr.COUNTRY = cr.COUNTRY and subcr.CURRENCY = cr.CURRENCY group by subcr.CURRENCY, subcr.COUNTRY));

“不存在”的最新货币:

select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
  where not exists (select 1 from CURRENCY_RATE subcr where subcr.COUNTRY = cr.COUNTRY and subcr.CURRENCY = cr.CURRENCY and subcr.dt > cr.dt);

-- tooo long....
select count (*) from (
  select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
    where not exists (select 1 from CURRENCY_RATE subcr where subcr.COUNTRY = cr.COUNTRY and cr.CURRENCY = subcr.CURRENCY and subcr.dt > cr.dt));

“加入”和“为空”的最新货币:

-- Too long...
select cr1.* from CURRENCY_RATE cr1
  left join CURRENCY_RATE cr2
    on (cr1.COUNTRY = cr2.COUNTRY and cr1.CURRENCY = cr2.CURRENCY and cr2.DT > cr1.DT)
  where cr2.DT is null;

“row_number() over (partition by ... order by ...)”的最新货币:

with maxcr as (
  select cr.COUNTRY, cr.CURRENCY, cr.RATE, row_number() over (partition by cr.COUNTRY, cr.CURRENCY order by cr.DT desc) as rown
    from CURRENCY_RATE cr
) select * from  maxcr
  where maxcr.rown = 1;

select maxcr.COUNTRY, maxcr.CURRENCY, maxcr.RATE from (
  select cr.COUNTRY, cr.CURRENCY, cr.RATE, row_number() over (partition by cr.COUNTRY, cr.CURRENCY order by cr.DT desc) as rown
    from CURRENCY_RATE cr) maxcr
  where maxcr.rown = 1;

-- 2.5 sec
select count(*) from (
  select maxcr.COUNTRY, maxcr.CURRENCY, maxcr.RATE from (
    select cr.COUNTRY, cr.CURRENCY, cr.RATE, row_number() over (partition by cr.COUNTRY, cr.CURRENCY order by cr.DT desc) as rown
      from CURRENCY_RATE cr) maxcr
    where maxcr.rown = 1);

“max”和“in”的最新货币:

select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
  where (cr.COUNTRY, cr.CURRENCY, cr.dt) in (select subcr.COUNTRY, subcr.CURRENCY, max(subcr.DT) from CURRENCY_RATE subcr group by subcr.COUNTRY, subcr.CURRENCY);

-- .250 sec
select count(*) from (
  select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
    where (cr.COUNTRY, cr.CURRENCY, cr.dt) in (select subcr.COUNTRY, subcr.CURRENCY, max(subcr.DT) from CURRENCY_RATE subcr group by subcr.COUNTRY, subcr.CURRENCY));

-- 2.3 sec
update DATA_2 inc
  set inc.MONEY_CNV = inc.MONEY_V * (
    select cr1.RATE from (
        select comp.COMPANY COMPANY, cr.CURRENCY, cr.RATE from COMPANY comp
          join CURRENCY_RATE cr on (cr.COUNTRY = comp.COUNTRY)
          where (cr.COUNTRY, cr.CURRENCY, cr.dt) in (select subcr.COUNTRY, subcr.CURRENCY, max(subcr.DT) from CURRENCY_RATE subcr group by subcr.COUNTRY, subcr.CURRENCY)) cr1
      where cr1.COMPANY = inc.COMPANY and cr1.CURRENCY = inc.CODE_V)
  where inc.INC_DATE > DATE '2014-01-01';

“max”和“=”的最新货币:

select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
  where cr.dt = (select max(subcr.DT) from CURRENCY_RATE subcr where subcr.COUNTRY = cr.COUNTRY and subcr.CURRENCY = cr.CURRENCY);

-- .250 sec
select count (*) from (
  select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
    where cr.dt = (select max(subcr.DT) from CURRENCY_RATE subcr where subcr.COUNTRY = cr.COUNTRY and subcr.CURRENCY = cr.CURRENCY));

with cr1 as (select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
  where cr.dt = (select max(subcr.DT) from CURRENCY_RATE subcr where subcr.COUNTRY = cr.COUNTRY and cr.CURRENCY = subcr.CURRENCY)
) select comp.COMPANY, cr1.CURRENCY, cr1.RATE from cr1
  join COMPANY comp on cr1.COUNTRY = comp.COUNTRY;

-- .250 sec
with cr1 as (select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
  where cr.dt = (select max(subcr.DT) from CURRENCY_RATE subcr where subcr.COUNTRY = cr.COUNTRY and cr.CURRENCY = subcr.CURRENCY)
) select count(*) from cr1
  join COMPANY comp on cr1.COUNTRY = comp.COUNTRY;

with cr1 as (
  select comp.COMPANY, cr.CURRENCY, cr.RATE from COMPANY comp
    join CURRENCY_RATE cr on cr.COUNTRY = comp.COUNTRY
    where cr.dt = (select max(subcr.DT) from CURRENCY_RATE subcr where subcr.COUNTRY = comp.COUNTRY and cr.CURRENCY = subcr.CURRENCY)
) select count(*) from cr1;

-- 3 sec
update DATA_2 inc
  set inc.MONEY_CNV = inc.MONEY_V * (
    select cr1.RATE from (
        select comp.COMPANY COMPANY, cr.CURRENCY, cr.RATE from COMPANY comp
          join CURRENCY_RATE cr on (cr.COUNTRY = comp.COUNTRY)
          where cr.dt = (select max(subcr.DT) from CURRENCY_RATE subcr
                           where subcr.COUNTRY = comp.COUNTRY and cr.CURRENCY = subcr.CURRENCY)) cr1
      where cr1.COMPANY = inc.COMPANY and cr1.CURRENCY = inc.CODE_V)
  where inc.INC_DATE > DATE '2014-01-01';

“max”与“group py”和“join”的最新货币:

select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
  join (select subcr.CURRENCY, subcr.COUNTRY, max(subcr.DT) dt from CURRENCY_RATE subcr group by subcr.CURRENCY, subcr.COUNTRY) maxcr
    on maxcr.COUNTRY = cr.COUNTRY and maxcr.CURRENCY = cr.CURRENCY and maxcr.dt = cr.DT;

-- .250 sec
select count (*) from (
  select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
    join (select subcr.CURRENCY, subcr.COUNTRY, max(subcr.DT) dt from CURRENCY_RATE subcr group by subcr.CURRENCY, subcr.COUNTRY) maxcr
      on maxcr.COUNTRY = cr.COUNTRY and maxcr.CURRENCY = cr.CURRENCY and maxcr.dt = cr.DT);

with cr1 as (
  select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
    join (select subcr.CURRENCY, subcr.COUNTRY, max(subcr.DT) dt from CURRENCY_RATE subcr group by subcr.CURRENCY, subcr.COUNTRY) maxcr
      on maxcr.COUNTRY = cr.COUNTRY and maxcr.CURRENCY = cr.CURRENCY and maxcr.dt = cr.DT
) select comp.COMPANY, cr1.CURRENCY, cr1.RATE from cr1
  join COMPANY comp on cr1.COUNTRY = comp.COUNTRY;

-- .300 sec
with cr1 as (
  select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
    join (select subcr.CURRENCY, subcr.COUNTRY, max(subcr.DT) dt from CURRENCY_RATE subcr group by subcr.CURRENCY, subcr.COUNTRY) maxcr
      on maxcr.COUNTRY = cr.COUNTRY and maxcr.CURRENCY = cr.CURRENCY and maxcr.dt = cr.DT
) select count(*) from cr1
  join COMPANY comp on cr1.COUNTRY = comp.COUNTRY;

最新的 N 种货币:

-- Vendor independent by tooo slow...
select cr.COUNTRY, cr.CURRENCY, cr.RATE from CURRENCY_RATE cr
  left outer join CURRENCY_RATE cr2
    on (cr2.COUNTRY = cr.COUNTRY and cr2.CURRENCY = cr.CURRENCY and cr2.DT >= cr.DT)
  group by cr.COUNTRY, cr.CURRENCY, cr.RATE
  having count(*) <= 3
  order by cr.COUNTRY, cr.CURRENCY, cr.RATE;

-- Very fast (full table scan).
select cr.COUNTRY, cr.CURRENCY, cr.RATE, cr.DT from (
    select subcr.*, row_number() over (partition by subcr.COUNTRY, subcr.CURRENCY order by subcr.DT) rown from CURRENCY_RATE subcr) cr
  where cr.rown <= 3;

【讨论】:

以上是关于每个组的最大 n 条件并加入大表(或在汇率表中出现漏洞时查询以本国货币表示的金额)的主要内容,如果未能解决你的问题,请参考以下文章

子查询中匹配条件的每个组的最大值

如何根据条件选择每个组的 x 或 y 成员

加入两个表,只显示唯一值和最大日期

在大表中查询“对”

在PostgreSQL中选择N个匹配条件的随机行

表中每个组的每月计数