电源查询。合并具有折叠值的行中的重复行
Posted
技术标签:
【中文标题】电源查询。合并具有折叠值的行中的重复行【英文标题】:Power Query. Merge duplicated lines in a row with collapsing values 【发布时间】:2021-09-12 14:59:00 【问题描述】:我有一个包含 Stops、Time_in 和 Time_out 的巴士时刻表。有时在我的数据中 Stops 重复(连续),我需要合并它们,只留下第一个 Time_in 和最后一个 Time_out。
下面是一个例子:
Stop | Time_in | Time_out |
---|---|---|
23rd Street | 15:23 | 15:27 |
42nd Street | 15:35 | 15:40 |
42nd Street | 15:42 | 15:48 |
47th Street | 15:56 | 16:10 |
42nd Street | 16:14 | 16:19 |
想要的结果:
Stop | Time_in | Time_out |
---|---|---|
23rd Street | 15:23 | 15:27 |
42nd Street | 15:35 | 15:48 |
47th Street | 15:56 | 16:10 |
42nd Street | 16:14 | 16:19 |
不胜感激,提前致谢。
【问题讨论】:
【参考方案1】:在 powerquery 中,右键单击列 Stop,然后 Group By....
选择添加分组
对于 Time_in 列的第一行选取操作最小值
对于第二行,选择 Time_out 列上的操作最大值
如果需要,将类型编号更改为在公式栏中或主页中输入时间...高级编辑器..
let Source = Excel.CurrentWorkbook()[Name="Table1"][Content],
#"Changed Type" = Table.TransformColumnTypes(Source,"Stop", type text, "Time_in", type time, "Time_out", type time),
#"Grouped Rows" = Table.Group(#"Changed Type", "Stop", "Time_in", each List.Min([Time_in]), type time, "Time_out", each List.Max([Time_out]), type time)
in #"Grouped Rows"
对于 Stops 可以重复的新要求,我们首先创建一个组号,以确保 Stops 在合并之前位于相邻行中
添加列索引列
添加列,带公式的自定义列
= try if #"Added Index"[Index][Stop] = #"Added Index"[Index]-1[Stop] then null else [Index] otherwise [Index]
右键单击新列并向下填充
单击停止和自定义列并在其上分组
选择添加聚合
对于 Time_in 列的第一行选取操作最小值
对于第二行,选择 Time_out 列上的操作最大值。
示例代码:
let Source = Excel.CurrentWorkbook()[Name="Table1"][Content],
#"Changed Type" = Table.TransformColumnTypes(Source,"Stop", type text, "Time_in", type time, "Time_out", type time),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try if #"Added Index"[Index][Stop] = #"Added Index"[Index]-1[Stop] then null else [Index] otherwise [Index]),
#"Filled Down" = Table.FillDown(#"Added Custom","Custom"),
#"Grouped Rows" = Table.Group(#"Filled Down", "Stop", "Custom", "Time_in", each List.Min([Time_in]), type time, "Time_out", each List.Max([Time_out]), type time),
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows","Custom")
in #"Removed Columns"
【讨论】:
你好骑马,非常感谢!您能否提出一个解决方案,其中可以重复 Stop 列中的值(因此我们不能使用 group by,否则进一步的停止将消失)。我需要从仅连续重复的 Stops 中删除值。公共汽车可以访问相同的站点,我不想失去它们 很抱歉没有立即澄清这一点,我修复了最初的帖子 这正是我所需要的!!非常优雅,非常感谢! 谢谢。然后请切换箭头之间的复选标记以接受答案【参考方案2】:电源查询
let
Source = Web.BrowserContents("https://***.com/questions/68194967/power-query-merge-duplicated-rows-with-collapsing-values"),
#"Extracted Table From html" = Html.Table(Source, "Column1", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(1)", "Column2", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(2)", "Column3", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(3)", [RowSelector="DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR"]),
#"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers","Stop", type text, "Time_in", type time, "Time_out", type time),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type","Time_out"),
#"Grouped Rows" = Table.Group(#"Removed Columns", "Stop", "ad_1", each _, type table [Stop=nullable text, Time_in=nullable time]),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each let x= [ad_1],
#"Removed Columns1" = Table.RemoveColumns(x,"Stop"),
#"Sorted Rows" = Table.Sort(#"Removed Columns1","Time_in", Order.Ascending),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Index] = 1)),
#"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows","Index")
in
#"Removed Columns2"),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom","ad_1"),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns1", "Custom", "Time_in", "Time_in"),
Custom1 = Table.RemoveColumns(#"Changed Type","Time_in"),
#"Grouped Rows1" = Table.Group(Custom1, "Stop", "ad_2", each _, type table [Stop=nullable text, Time_out=nullable time]),
Custom2 = Table.AddColumn(#"Grouped Rows1", "Custom", each let x= [ad_2],
#"Removed Columns1" = Table.RemoveColumns(x,"Stop"),
#"Sorted Rows" = Table.Sort(#"Removed Columns1","Time_out", Order.Descending),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Index] = 1)),
#"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows","Index")
in
#"Removed Columns2"),
#"Removed Columns2" = Table.RemoveColumns(Custom2,"ad_2"),
#"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", "Time_out", "Time_out"),
#"Merged Queries" = Table.NestedJoin(#"Expanded Custom", "Stop", #"Expanded Custom1", "Stop", "Expanded Custom1", JoinKind.LeftOuter),
#"Expanded Expanded Custom1" = Table.ExpandTableColumn(#"Merged Queries", "Expanded Custom1", "Time_out", "Time_out")
in
#"Expanded Expanded Custom1"
DAX
min:= MIN('Table 1'[Time_in])
max:= MAX('Table 1'[Time_out])
DAX 结果
【讨论】:
您好smpa01,非常感谢!我尝试了您的解决方案,并意识到我错过了所有重复的停止,但我需要从仅连续重复的停止中删除值。公共汽车可以访问相同的站点,我不想失去它们,只需要折叠其中一些。很抱歉没有立即澄清这一点以上是关于电源查询。合并具有折叠值的行中的重复行的主要内容,如果未能解决你的问题,请参考以下文章
MYSQL - 将具有多个重复值的行组合起来,然后删除重复项
Pandas - 合并 DataFrame 中的行 [重复]