SQL Server - 如何使部分重复的行从原始行继承值

Posted 2023-03-08

技术标签:

【中文标题】SQL Server - 如何使部分重复的行从原始行继承值【英文标题】：SQL Server - How to make partially duplicate rows inherit values from original row 【发布时间】：2018-08-14 23:52:16 【问题描述】：

为了跨数据集链接记录，我首先根据关键链接变量（按名称、出生日期、性别等进行分区并删除 row_number > 1 的位置）将记录删除为不重复的记录。链接完成后，我留下了一个新变量“unique_id”，但这只会归因于原始记录（因为我删除了部分重复项）。我现在想将此“unique_id”重新附加到所有部分重复项。我怎么能这样做呢？有没有更好的方法我可以从一开始就使用？

数据目前如下所示：

row_number unique_id id      first_name last_name activity_date
1          10        2       Davy       Jones     1726-11-25
2          --        12      Davy       Jones     1751-02-11
3          --        43      Davy       Jones     1811-06-15
1          100       12114   John       Smith     2018-06-01
2          --        123123  John       Smith     2022-07-05
1          90        2591    Mary       Sue       2013-05-18

我希望“unique_id”像这样继承原件：

row_number unique_id id      first_name last_name activity_date
1          10        2       Davy       Jones     1726-11-25
2          10        12      Davy       Jones     1751-02-11
3          10        43      Davy       Jones     1811-06-15
1          100       12114   John       Smith     2018-06-01
2          100       123123  John       Smith     2022-07-05
1          90        2591    Mary       Sue       2013-05-18

生成此表的代码如下：

create table #test (
    unique_id int,
    id int,
    first_name varchar(255),
    last_name varchar(255),
    activity_date date
)

insert into #test 
values (100, 12114, 'John', 'Smith', '2018-06-01')

insert into #test (id, first_name, last_name, activity_date)
values (123123, 'John', 'Smith', '2022-07-05')

insert into #test
values (90, 2591, 'Mary', 'Sue', '2013-05-18')

insert into #test
values (10, 2, 'Davy', 'Jones', '1726-11-25')

insert into #test (id, first_name, last_name, activity_date)
values (12, 'Davy', 'Jones', '1751-02-11')

insert into #test (id, first_name, last_name, activity_date)
values (43, 'Davy', 'Jones', '1811-06-15')

select 
row_number() over (partition by first_name, last_name order by first_name, last_name) as row_number
,unique_id, id, first_name, last_name, activity_date
from #test

【问题讨论】：

【参考方案1】：

一种简单的方法——假设每个 first_name/last_name 对有一个值——是使用窗口函数：

select t.*, max(unique_id) over (partition by first_name, last_name) as new_unique_id
from #test t;

这可以放入update:

with toupdate as (
      select t.*, max(unique_id) over (partition by first_name, last_name) as new_unique_id
      from #test t
     )
update toupdate
    set unique_id = new_unique_id;

这是一个rextester 说明语法。

【讨论】：

这是什么版本的 SQL？ @Alex 。 . .这是 SQL Server 语法——即问题上的标记。 @Alex 。 . .我更新了错误的字段，但代码有效。我添加了一个 rextester，这样你就可以看到语法工作了。只是为了帮助我理解-您不需要像 Alex 的回答那样在更新语句之后指定 FROM #test ... INNER JOIN Dups，因为您已经在 WITH 函数中指定了 FROM #test 并且您是直接从这里更新（而 Alex 从 #test 更新而不是最初创建的 WITH 函数）？ @Maharero 。 . .我的回答使用可更新的 CTE。它引用了一张表，因此不需要join。【参考方案2】：

试试这个：

with Dups as(
    select 
    row_number() over (partition by first_name, last_name order by first_name, last_name) as dup_number,
    -- dense_rank() over (order by first_name, last_name) as DuplicateGroupNumber, -- this allows you to see groups
    max(unique_id) over (partition by first_name, last_name) as GroupUniqueID,
    unique_id, id, first_name, last_name, activity_date
    from #test
)
update a
set unique_id = GroupUniqueID
from #test as a
    inner join Dups as b on a.id = b.id

select * from #test

结果

unique_id   id          first_name  
----------- ----------- ------------
100         12114       John        
100         123123      John        
90          2591        Mary        
10          2           Davy        
10          12          Davy        
10          43          Davy

【讨论】：

【参考方案3】：

看起来您应该使用您认为合适的任何字段将具有链接 id 的记录的子集与没有链接 id 的记录连接起来，然后从链接中的 id 更新未链接集中的 id设置。

【讨论】：

以上是关于SQL Server - 如何使部分重复的行从原始行继承值的主要内容，如果未能解决你的问题，请参考以下文章