Excel PowerQuery:如何将巨大的表格反透视或转置为可读格式以进行分析
Posted
技术标签:
【中文标题】Excel PowerQuery:如何将巨大的表格反透视或转置为可读格式以进行分析【英文标题】:Excel PowerQuery: How to unpivot or transpose a huge table into a readable format for analysis 【发布时间】:2018-11-09 01:32:05 【问题描述】:我有一张看起来与此类似的表格:
我想把它改成这样:
这个想法是对表进行反透视(或转置),以便将其输入到其他 BI 工具中,并且可以读取以进行分析。
我有大约 20 个这样的表,其中包含 100 多个列,所以当然手动完成几乎是不可能的。
如何使用 PowerQuery 完成这项工作?我曾尝试使用 unpivot 功能,但我被卡住了,因为它显示 NYC1、NYC2 等。VBA、宏也不起作用。任何其他建议表示赞赏,但我现在无能为力。救命!
【问题讨论】:
你尝试过什么 VBA? 我尝试在此处重新创建一个宏并对其进行调整:***.com/questions/53203225/… 但决定在这里使用 PowerQuery 是一种更好的方法。 【参考方案1】:在加载到 PowerQuery 之前,使用分隔符(空格)将标题连接到 [程序名称] 之后的空行。如果您使用 office365,您可以使用 TEXTJOIN 函数来执行此操作。结果看起来像这样(我没有复制你所有的数据):
将此范围导入 PowerQuery 并执行以下步骤(不要选中 my table has headers
复选框)
-
删除前 2 行(主页选项卡 > 删除行)
将第一行用作标题(主页选项卡 > 将第一行用作标题)
选择第一列
取消透视其他列(下拉菜单取消透视变换选项卡上的列)
按分隔符(空格)拆分 [属性] 列(主页选项卡 > 拆分列)
更改列名
将城市列向左移动(右键单击列 > 移动 > 向左)
脚本如下所示:
let
Source = Excel.CurrentWorkbook()[Name="table"][Content],
#"Changed Type" = Table.TransformColumnTypes(Source,"Column1", type text, "Column2", type any, "Column3", type any, "Column4", type any, "Column5", type any, "Column6", type any, "Column7", type any, "Column8", type any, "Column9", type any),
#"Removed Top Rows" = Table.Skip(#"Changed Type",2),
#"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers","Program Name", type text, "NY Budget", Int64.Type, "NY Revenue", Int64.Type, "NY Cost", Int64.Type, "NY Margin", Int64.Type, "LA Budget", Int64.Type, "LA Revenue", Int64.Type, "LA Cost", Int64.Type, "LA Margin", Int64.Type),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type1", "Program Name", "Attribute", "Value"),
#"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), "Attribute.1", "Attribute.2"),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter","Attribute.1", type text, "Attribute.2", type text),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type2","Attribute.1", "City", "Attribute.2", "Description"),
#"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns","City", "Program Name", "Description", "Value")
in
#"Reordered Columns"
这就是结果(在 Power Query 编辑器中)
【讨论】:
【参考方案2】:这是一个非常通用的 depivot 方法,可以处理多个行/列标题。
在运行前选择源表中的一个单元格(注意 - 这使用CurrentRegion
,因此如果您的表有完全空白的行或列,则会失败)。
Sub UnpivotIt()
Dim numRowHeaders As Long, numColHeaders As Long
Dim numRows As Long, numCols As Long, rng As Range
Dim rngOut As Range, r As Long, c As Long, i As Long, n As Long
Dim arrIn, arrOut, outRow As Long
arrIn = Selection.CurrentRegion.Value
numRowHeaders = Application.InputBox("How many header rows?", Type:=1)
numColHeaders = Application.InputBox("How many header columns?", Type:=1)
Set rngOut = Application.InputBox("Select output (top-left cell)", Type:=8)
Set rngOut = rngOut.Cells(1) 'in case >1 cells selected
numRows = UBound(arrIn, 1)
numCols = UBound(arrIn, 2)
ReDim arrOut(1 To ((numRows - numRowHeaders) * (numCols - numColHeaders)), _
1 To (numRowHeaders + numColHeaders + 1))
outRow = 0
For r = (numRowHeaders + 1) To numRows
For c = (numColHeaders + 1) To numCols
'only copy if there's a value
If Len(arrIn(r, c)) > 0 Then
outRow = outRow + 1
i = 1
For n = 1 To numColHeaders 'copy column headers
arrOut(outRow, i) = arrIn(r, n)
i = i + 1
Next n
For n = 1 To numRowHeaders '...row headers
arrOut(outRow, i) = arrIn(n, c)
i = i + 1
Next n
arrOut(outRow, i) = arrIn(r, c) '...and the value
End If
Next c
Next r
rngOut.Resize(outRow, UBound(arrOut, 2)).Value = arrOut
End Sub
【讨论】:
以上是关于Excel PowerQuery:如何将巨大的表格反透视或转置为可读格式以进行分析的主要内容,如果未能解决你的问题,请参考以下文章
数据可视化之PowerQuery篇PowerQuery技巧:批量合并Excel表的指定列