电源查询。合并具有折叠值的行中的重复行

Posted

技术标签:

【中文标题】电源查询。合并具有折叠值的行中的重复行【英文标题】:Power Query. Merge duplicated lines in a row with collapsing values 【发布时间】:2021-09-12 14:59:00 【问题描述】:

我有一个包含 Stops、Time_in 和 Time_out 的巴士时刻表。有时在我的数据中 Stops 重复(连续),我需要合并它们,只留下第一个 Time_in 和最后一个 Time_out。

下面是一个例子:

Stop Time_in Time_out
23rd Street 15:23 15:27
42nd Street 15:35 15:40
42nd Street 15:42 15:48
47th Street 15:56 16:10
42nd Street 16:14 16:19

想要的结果:

Stop Time_in Time_out
23rd Street 15:23 15:27
42nd Street 15:35 15:48
47th Street 15:56 16:10
42nd Street 16:14 16:19

不胜感激,提前致谢。

【问题讨论】:

【参考方案1】:

在 powerquery 中,右键单击列 Stop,然后 Group By....

选择添加分组

对于 Time_in 列的第一行选取操作最小值

对于第二行,选择 Time_out 列上的操作最大值

如果需要,将类型编号更改为在公式栏中或主页中输入时间...高级编辑器..

let Source = Excel.CurrentWorkbook()[Name="Table1"][Content],
#"Changed Type" = Table.TransformColumnTypes(Source,"Stop", type text, "Time_in", type time, "Time_out", type time),
#"Grouped Rows" = Table.Group(#"Changed Type", "Stop", "Time_in", each List.Min([Time_in]), type time, "Time_out", each List.Max([Time_out]), type time)
in  #"Grouped Rows"

对于 Stops 可以重复的新要求,我们首先创建一个组号,以确保 Stops 在合并之前位于相邻行中

添加列索引列

添加列,带公式的自定义列

= try if #"Added Index"[Index][Stop] = #"Added Index"[Index]-1[Stop] then null else [Index] otherwise [Index]

右键单击新列并向下填充

单击停止和自定义列并在其上分组

选择添加聚合

对于 Time_in 列的第一行选取操作最小值

对于第二行,选择 Time_out 列上的操作最大值。

示例代码:

let Source = Excel.CurrentWorkbook()[Name="Table1"][Content],
#"Changed Type" = Table.TransformColumnTypes(Source,"Stop", type text, "Time_in", type time, "Time_out", type time),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try if #"Added Index"[Index][Stop] = #"Added Index"[Index]-1[Stop] then null else [Index] otherwise [Index]),
#"Filled Down" = Table.FillDown(#"Added Custom","Custom"),
#"Grouped Rows" = Table.Group(#"Filled Down", "Stop", "Custom", "Time_in", each List.Min([Time_in]), type time, "Time_out", each List.Max([Time_out]), type time),
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows","Custom")
in #"Removed Columns"

【讨论】:

你好骑马,非常感谢!您能否提出一个解决方案,其中可以重复 Stop 列中的值(因此我们不能使用 group by,否则进一步的停止将消失)。我需要从仅连续重复的 Stops 中删除值。公共汽车可以访问相同的站点,我不想失去它们 很抱歉没有立即澄清这一点,我修复了最初的帖子 这正是我所需要的!!非常优雅,非常感谢! 谢谢。然后请切换箭头之间的复选标记以接受答案【参考方案2】:

电源查询

    let
    Source = Web.BrowserContents("https://***.com/questions/68194967/power-query-merge-duplicated-rows-with-collapsing-values"),
    #"Extracted Table From html" = Html.Table(Source, "Column1", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(1)", "Column2", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(2)", "Column3", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(3)", [RowSelector="DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR"]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers","Stop", type text, "Time_in", type time, "Time_out", type time),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type","Time_out"),
    #"Grouped Rows" = Table.Group(#"Removed Columns", "Stop", "ad_1", each _, type table [Stop=nullable text, Time_in=nullable time]),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each let x= [ad_1],
 #"Removed Columns1" = Table.RemoveColumns(x,"Stop"),
    #"Sorted Rows" = Table.Sort(#"Removed Columns1","Time_in", Order.Ascending),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
    #"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Index] = 1)),
    #"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows","Index")
in
    #"Removed Columns2"),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom","ad_1"),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns1", "Custom", "Time_in", "Time_in"),
    Custom1 = Table.RemoveColumns(#"Changed Type","Time_in"),
    #"Grouped Rows1" = Table.Group(Custom1, "Stop", "ad_2", each _, type table [Stop=nullable text, Time_out=nullable time]),
    Custom2 = Table.AddColumn(#"Grouped Rows1", "Custom", each let x= [ad_2],
 #"Removed Columns1" = Table.RemoveColumns(x,"Stop"),
    #"Sorted Rows" = Table.Sort(#"Removed Columns1","Time_out", Order.Descending),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
    #"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Index] = 1)),
    #"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows","Index")
in
    #"Removed Columns2"),
    #"Removed Columns2" = Table.RemoveColumns(Custom2,"ad_2"),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", "Time_out", "Time_out"),
    #"Merged Queries" = Table.NestedJoin(#"Expanded Custom", "Stop", #"Expanded Custom1", "Stop", "Expanded Custom1", JoinKind.LeftOuter),
    #"Expanded Expanded Custom1" = Table.ExpandTableColumn(#"Merged Queries", "Expanded Custom1", "Time_out", "Time_out")
in
    #"Expanded Expanded Custom1"

DAX

min:= MIN('Table 1'[Time_in])
max:= MAX('Table 1'[Time_out])

DAX 结果

【讨论】:

您好smpa01,非常感谢!我尝试了您的解决方案,并意识到我错过了所有重复的停止,但我需要从仅连续重复的停止中删除值。公共汽车可以访问相同的站点,我不想失去它们,只需要折叠其中一些。很抱歉没有立即澄清这一点

以上是关于电源查询。合并具有折叠值的行中的重复行的主要内容,如果未能解决你的问题,请参考以下文章

MS Access 查询:合并特定字段列中具有相同数据的行

MYSQL - 将具有多个重复值的行组合起来,然后删除重复项

如何将具有重复值的行转换为列?

Pandas - 合并 DataFrame 中的行 [重复]

如何不在 dataverse 表中添加重复值表电源自动添加行

Django 计数具有重复值的行