通过比较sql server中同一张表的每条记录返回重复
Posted
技术标签:
【中文标题】通过比较sql server中同一张表的每条记录返回重复【英文标题】:Return duplicates by comparing each records of the same table in the sql server 【发布时间】:2016-09-12 06:52:48 【问题描述】:我有如下表。我想获取重复的记录。这里是条件
状态 = 1 的订阅者,即活跃,并且通过比较 start_date 和 end_date 在当前年份拥有多条记录。我在 DB 中有大约 5000 多条记录。这里展示几个示例。
id pkg_id start_date end_date status subscriber_id
2857206 9128 8/31/2014 8/31/2015 2 3031103
2857207 9128 12/22/2015 12/22/2016 1 3031103
3066285 10308 8/5/2016 8/4/2018 1 3031103
2857206 9128 8/31/2013 8/31/2015 2 3031104
2857207 9128 10/20/2015 11/22/2016 1 3031104
3066285 10308 7/5/2016 7/4/2018 1 3031104
3066285 10308 8/5/2016 8/4/2018 2 3031105
我尝试了下面的查询,但不适用于所有记录:
SELECT *
FROM dbo.consumer_subsc
WHERE status = 1
AND YEAR(GETDATE()) >= YEAR(start_date)
AND YEAR(GETDATE()) <= YEAR(end_date)
AND subscriber_id IN (
SELECT T.subscriber_id
FROM ( SELECT subscriber_id ,
COUNT(subscriber_id) AS cnt
FROM dbo.consumer_subsc
WHERE status = 1
GROUP BY subscriber_id
HAVING COUNT(subscriber_id) > 1
) T )
ORDER BY subscriber_id DESC
问题是我无法找到一种方法,可以将每一行与上述日期条件进行比较。我应该得到如下重复的结果:
id pkg_id start_date end_date status subscriber_id
2857207 9128 12/22/2015 12/22/2016 1 3031103
3066285 10308 8/5/2016 8/4/2018 1 3031103
2857207 9128 10/20/2015 11/22/2016 1 3031104
3066285 10308 7/5/2016 7/4/2018 1 3031104
【问题讨论】:
【参考方案1】:只需在 where 子句中删除硬编码的用户 ID 过滤器即可。以下查询将返回预期的输出。
SELECT *
FROM dbo.consumer_subsc
WHERE STATUS = 1
AND year(getdate()) >= year(start_date)
AND year(getdate()) <= year(end_date)
AND subscriber_id IN (
SELECT T.subscriber_id
FROM (
SELECT subscriber_id
,count(subscriber_id) AS cnt
FROM dbo.consumer_subsc
WHERE STATUS = 1
GROUP BY subscriber_id
HAVING count(subscriber_id) > 1
) T
)
ORDER BY subscriber_id ,start_date
【讨论】:
请选择最适合您的答案。 我知道这一点,但预计只有重复的记录,它给了我其他非重复的数据。 删除了相同的查询我的唯一过滤器。给出重复记录和非重复记录都没有帮助。【参考方案2】:您可以使用 EXISTS:
SELECT t.* FROM dbo.consumer_subsc t
WHERE EXISTS(SELECT subscriber_id
FROM dbo.consumer_subsc y
WHERE y.status=t.status
AND y.subscriber_id = t.subscriber_id
GROUP BY subscriber_id HAVING COUNT(y.subscriber_id)>1)
AND STATUS = 1
AND year(getdate()) >= year(start_date)
AND year(getdate()) <= year(end_date)
【讨论】:
您忘记了当前年份必须介于start_date
和 end_date
之间。
它给我的所有记录都没有帮助!【参考方案3】:
WITH CTE (Code, DuplicateCount)
AS
(
SELECT subscriber_id,
ROW_NUMBER() OVER(PARTITION BY subscriber_id
ORDER BY subscriber_id) AS DuplicateCount
FROM dbo.consumer_subsc
where subscriber_id in (3031103)
and status=1 and year(getdate()) >= year(start_date)
and year(getdate()) <= year(end_date)
)
Select * from CTE
【讨论】:
【参考方案4】:下面的查询给出了接近预期的 O/P:
SELECT A.* FROM (SELECT t.*,Row_number() OVER(partition BY t.subscriber_id ORDER BY t.subscriber_id,t.start_date) rnk FROM dbo.consumer_subsc t
WHERE EXISTS(SELECT subscriber_id
FROM dbo.consumer_subsc y
WHERE y.status=t.status
AND y.subscriber_id = t.subscriber_id
GROUP BY subscriber_id HAVING COUNT(y.subscriber_id)>1)
AND STATUS = 1
AND year(getdate()) >= year(start_date)
AND year(getdate()) <= year(end_date))A WHERE A.rnk>1
【讨论】:
以上是关于通过比较sql server中同一张表的每条记录返回重复的主要内容,如果未能解决你的问题,请参考以下文章