为每个用户选择最新和第二个最新日期行
Posted
技术标签:
【中文标题】为每个用户选择最新和第二个最新日期行【英文标题】:Select latest and 2nd latest date rows per user 【发布时间】:2019-03-08 18:20:26 【问题描述】:我有以下查询来选择 LAST_UPDATE_DATE
字段正在获取日期值大于或等于过去 7 天的记录的行,效果很好。
SELECT 'NEW ROW' AS 'ROW_TYPE', A.EMPLID, B.FIRST_NAME, B.LAST_NAME,
A.BANK_CD, A.ACCOUNT_NUM, ACCOUNT_TYPE, PRIORITY, A.LAST_UPDATE_DATE
FROM PS_DIRECT_DEPOSIT D
INNER JOIN PS_DIR_DEP_DISTRIB A ON A.EMPLID = D.EMPLID AND A.EFFDT = D.EFFDT
INNER JOIN PS_EMPLOYEES B ON B.EMPLID = A.EMPLID
WHERE
B.EMPL_STATUS NOT IN ('T','R','D')
AND ((A.DEPOSIT_TYPE = 'P' AND A.AMOUNT_PCT = 100)
OR A.PRIORITY = 999
OR A.DEPOSIT_TYPE = 'B')
AND A.EFFDT = (SELECT MAX(A1.EFFDT)
FROM PS_DIR_DEP_DISTRIB A1
WHERE A1.EMPLID = A.EMPLID
AND A1.EFFDT <= GETDATE())
AND D.EFF_STATUS = 'A'
AND D.EFFDT = (SELECT MAX(D1.EFFDT)
FROM PS_DIRECT_DEPOSIT D1
WHERE D1.EMPLID = D.EMPLID
AND D1.EFFDT <= GETDATE())
AND A.LAST_UPDATE_DATE >= GETDATE() - 7
我想添加的是每个 EMPLID 的前一个(第 2 个 MAX)行,以便我可以输出“旧”行(即在上次更新之前满足上述条件的最新行) ,以及我已经在查询中输出的新行。
ROW_TYPE EMPLID FIRST_NAME LAST_NAME BANK_CD ACCOUNT_NUM ACCOUNT_TYPE PRIORITY LAST_UPDATE_DATE
NEW ROW 12345 JOHN SMITH 123548999 45234879 C 999 2019-03-06 00:00:00.000
OLD ROW 12345 JOHN SMITH 214080046 92178616 C 999 2018-10-24 00:00:00.000
NEW ROW 56399 CHARLES MASTER 785816167 84314314 C 999 2019-03-07 00:00:00.000
OLD ROW 56399 CHARLES MASTER 345761227 547352 C 999 2017-05-16 00:00:00.000
所以EMPLID
将按 NEW ROW 排序,然后是 OLD ROW,如上所示。在此示例中,“NEW ROW”获取过去 7 天内的记录,如 LAST_UPDATE_DATE
所示。
我想获得有关如何修改查询的反馈,以便我还可以获得“旧”行(这是小于上面检索到的“新”行的最大行)。
【问题讨论】:
有什么想法可以做到这一点吗? 您的查询没有按照您所说的那样做。它会提取过去 7 天内更新的所有记录,然后使用来自其他表的附加数据来修饰这些记录。如果PS_DIRECT_DEPOSIT
上的任何记录在一周内有多次更新,您将获得全部更新,并且查询中的硬编码行将它们全部标记为NEW ROW
。那么,您是否正在寻找EMPLID
的两个最新更新?或者之前通过EMPLID
对上周碰巧有更新的任何员工进行的更新?非常不同的问题。
顺便说一句,你应该always use meaningful table aliases。
我正在寻找过去 7 天内的 MAX LAST_UPDATE_DATE,对于正确的记录,我还想显示第二个最新的(在 MAX 行之后)行,它将显示该记录的先前值是什么。希望这是有道理的,谢谢!
【参考方案1】:
这是哥谭犯罪活动缓慢的一天,所以我试了一下。可能会工作。
虽然这不太可能开箱即用,但它应该可以帮助您入门。
您的LAST_UPDATE_DATE
列在表PS_DIR_DEP_DISTRIB
上,所以我们将从那里开始。首先,您想要识别过去 7 天内更新的所有记录,因为这些是您唯一感兴趣的记录。在整个过程中,我假设并且我可能错了,该表由EMPLID
、BANK_CD
和ACCOUNT_NUM
组成。您需要在几个地方为这些列添加实际的自然键。也就是说,日期限制器看起来像这样:
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND
limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
现在我们将把它用作WHERE EXISTS
子句中的关联子查询,我们将关联回基表,以限制我们使用上周更新的自然键值的记录。我将 SELECT
列表更改为 SELECT 1
,这是相关子的典型用语,因为它在找到一 (1) 时停止寻找匹配项,并且实际上根本不返回任何值。
此外,由于我们无论如何都要过滤此记录集,因此我将此表的所有其他 WHERE
子句过滤器移至此(即将成为)子查询中。
最后,在SELECT
部分,我添加了DENSE_RANK
来强制对记录进行排序。我们稍后会使用DENSE_RANK
值仅过滤掉前 (N) 个感兴趣的记录。
所以这给我们留下了这个:
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
--,ACCOUNT_TYPE --Might belong here. Can't tell without table alias in original SELECT
,PRIORITY
,EFFDT
,LAST_UPDATE_DATE
,DEPOSIT_TYPE
,AMOUNT_PCT
,DENSE_RANK() OVER (PARTITION BY --Add actual natural key columns here...
EMPLID
ORDER BY
LAST_UPDATE_DATE DESC
) AS RowNum
FROM
PS_DIR_DEP_DISTRIB AS sdist
WHERE
EXISTS
(
-- Get the set of records that were last updated in the last 7 days.
-- Correlate to the outer query so it only returns records related to this subset.
-- This uses a correlated subquery. A JOIN will work, too. Try both, pick the faster one.
-- Something like this, using the actual natural key columns in the WHERE
SELECT
1
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
--The first two define the date range.
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
AND
--And these are the correlations to the outer query.
limit.EMPLID = sdist.EMPLID
AND limit.BANK_CD = sdist.BANK_CD
AND limit.ACCOUNT_NUM = sdist.ACCOUNT_NUM
)
AND
(
dist.DEPOSIT_TYPE = 'P'
AND dist.AMOUNT_PCT = 100
)
OR dist.PRIORITY = 999
OR dist.DEPOSIT_TYPE = 'B'
用该查询替换原来的 INNER JOIN
到 PS_DIR_DEP_DISTRIB
。在SELECT
列表中,第一个硬编码值现在依赖于RowNum
值,所以现在是CASE
表达式。在WHERE
子句中,日期都由子查询驱动,所以它们消失了,有几个被折叠到子查询中,我们添加WHERE dist.RowNum <= 2
以恢复前2 条记录。
(我还替换了所有表别名,以便跟踪我正在查看的内容。)
SELECT
CASE dist.RowNum
WHEN 1 THEN 'NEW ROW'
ELSE 'OLD ROW'
END AS ROW_TYPE
,dist.EMPLID
,emp.FIRST_NAME
,emp.LAST_NAME
,dist.BANK_CD
,dist.ACCOUNT_NUM
,ACCOUNT_TYPE
,dist.PRIORITY
,dist.LAST_UPDATE_DATE
FROM
PS_DIRECT_DEPOSIT AS dd
INNER JOIN
(
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
--,ACCOUNT_TYPE --Might belong here. Can't tell without table alias in original SELECT
,PRIORITY
,EFFDT
,LAST_UPDATE_DATE
,DEPOSIT_TYPE
,AMOUNT_PCT
,DENSE_RANK() OVER (PARTITION BY --Add actual natural key columns here...
EMPLID
ORDER BY
LAST_UPDATE_DATE DESC
) AS RowNum
FROM
PS_DIR_DEP_DISTRIB AS sdist
WHERE
EXISTS
(
-- Get the set of records that were last updated in the last 7 days.
-- Correlate to the outer query so it only returns records related to this subset.
-- This uses a correlated subquery. A JOIN will work, too. Try both, pick the faster one.
-- Something like this, using the actual natural key columns in the WHERE
SELECT
1
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
--The first two define the date range.
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
AND
--And these are the correlations to the outer query.
limit.EMPLID = sdist.EMPLID
AND limit.BANK_CD = sdist.BANK_CD
AND limit.ACCOUNT_NUM = sdist.ACCOUNT_NUM
)
AND
(
dist.DEPOSIT_TYPE = 'P'
AND dist.AMOUNT_PCT = 100
)
OR dist.PRIORITY = 999
OR dist.DEPOSIT_TYPE = 'B'
) AS dist
ON
dist.EMPLID = dd.EMPLID
AND dist.EFFDT = dd.EFFDT
INNER JOIN
PS_EMPLOYEES AS emp
ON
emp.EMPLID = dist.EMPLID
WHERE
dist.RowNum <= 2
AND
emp.EMPL_STATUS NOT IN ('T', 'R', 'D')
AND
dd.EFF_STATUS = 'A';
【讨论】:
以上是关于为每个用户选择最新和第二个最新日期行的主要内容,如果未能解决你的问题,请参考以下文章