多个类似列的反透视问题

Posted

技术标签:

【中文标题】多个类似列的反透视问题【英文标题】:unpivot issue with multiple like columns 【发布时间】:2020-09-19 22:35:02 【问题描述】:

我有一个 CSV 导入文件,我无法控制列名或格式。标头由id,[temperature range 1],[temperature range 2],[temperature range 3],[temperature range],[temperature range 5],[temperature range 6],[temperature range 7],[temperature range 8], [temperature range 9],[temperature range 10],[temperature range 11],[temperature range 12], [Pressure Range 1],[Pressure Range 2],[Pressure Range 3],[Pressure Range 4],[Pressure Range 5], [Pressure Range 6],[Pressure Range 7],[Pressure Range 8],[Pressure Range 9], [Pressure Range 10],[Pressure Range 11],[Pressure Range 12],[Calcite Saturation 1],[Calcite Saturation 2],[Calcite Saturation 3],[Calcite Saturation 4],[Calcite Saturation 5],[Calcite Saturation 6],[Calcite Saturation 7],[Calcite Saturation 8],[Calcite Saturation 9],[Calcite Saturation 10],[Calcite Saturation 11],[Calcite Saturation 12] 组成。

我已经能够使用此代码取消透视表:

Select id,Temp,Psia,CalciteX
from (
    Select id,[temperature range 1],[temperature range 2],[temperature range 3],[temperature range 4],[temperature range 5],[temperature range 6],[temperature range 7],[temperature range 8], [temperature range 9],[temperature range 10],[temperature range 11],[temperature range 12],[Pressure Range 1],[Pressure Range 2],[Pressure Range 3],[Pressure Range 4],[Pressure Range 5], [Pressure Range 6],[Pressure Range 7],[Pressure Range 8],[Pressure Range 9],[Pressure Range 10],[Pressure Range 11],[Pressure Range 12],[Calcite Saturation 1],[Calcite Saturation 2], [Calcite Saturation 3],[Calcite Saturation 4],[Calcite Saturation 5],[Calcite Saturation 6],[Calcite Saturation 7],[Calcite Saturation 8],[Calcite Saturation 9],[Calcite Saturation 10], [Calcite Saturation 11],[Calcite Saturation 12]
    from frenchcreekscale where id ='10009-2019111114.1/P' ) as src

UNPivot ( Temp for Temps in([temperature range 1],[temperature range 2],[temperature range 3],[temperature range 4],[temperature range 5],[temperature range 6],[temperature range 7],[temperature range 8], [temperature range 9],[temperature range 10],[temperature range 11],[temperature range 12]) ) AS Temps
 
UNPivot ( Psia for Pressures in([Pressure Range 1],[Pressure Range 2],[Pressure Range 3],[Pressure Range 4],[Pressure Range 5],[Pressure Range 6],[Pressure Range 7],[Pressure Range 8],[Pressure Range 9], [Pressure Range 10],[Pressure Range 11],[Pressure Range 12]) ) AS Pressures

UNPivot ( CalciteX for CalciteXs in([Calcite Saturation 1],[Calcite Saturation 2], [Calcite Saturation 3],[Calcite Saturation 4],[Calcite Saturation 5],[Calcite Saturation 6],[Calcite Saturation 7],[Calcite Saturation 8],[Calcite Saturation 9],[Calcite Saturation 10], [Calcite Saturation 11],[Calcite Saturation 12]) ) AS CalX

这是我得到的部分输出:

id                   Temp Psia CalciteX
-------------------- ---- ---- --------
10009-2019111114.1/P 70   0    0.165885
10009-2019111114.1/P 70   0    0.180097
10009-2019111114.1/P 70   0    0.195601
10009-2019111114.1/P 70   0    0.211319
10009-2019111114.1/P 70   0    0.226902
10009-2019111114.1/P 70   0    0.241826
10009-2019111114.1/P 70   0    0.25538
10009-2019111114.1/P 70   0    0.267159
10009-2019111114.1/P 70   0    0.276571
10009-2019111114.1/P 70   0    0.283237
10009-2019111114.1/P 70   0    0.286532
10009-2019111114.1/P 70   0    0.286462
10009-2019111114.1/P 70   147  0.165885
10009-2019111114.1/P 70   147  0.180097
10009-2019111114.1/P 70   147  0.195601
10009-2019111114.1/P 70   147  0.211319
10009-2019111114.1/P 70   147  0.226902
10009-2019111114.1/P 70   147  0.241826
10009-2019111114.1/P 70   147  0.25538
10009-2019111114.1/P 70   147  0.267159
10009-2019111114.1/P 70   147  0.276571
10009-2019111114.1/P 70   147  0.283237
10009-2019111114.1/P 70   147  0.286532
10009-2019111114.1/P 70   147  0.286462
10009-2019111114.1/P 70   278  0.165885
10009-2019111114.1/P 70   278  0.180097
etc

应该只有 12 条记录:

id                   Temp 
-------------------- ----
10009-2019111114.1/P 70
10009-2019111114.1/P 80
10009-2019111114.1/P 90
10009-2019111114.1/P 100
10009-2019111114.1/P 110
10009-2019111114.1/P 120
10009-2019111114.1/P 130
10009-2019111114.1/P 140
10009-2019111114.1/P 150
10009-2019111114.1/P 160
10009-2019111114.1/P 170
10009-2019111114.1/P 180

将其他字段绑定到 id。

我不明白如何在不生成多条记录的情况下取消透视下一列。

【问题讨论】:

【参考方案1】:

pivotunpivot 在您只提供数据透视所需的列时效果最佳。您的 src 子集包含 3 个 unpivot 的列。如果将它们拆分为单独的公用表表达式 (CTE),则可以将它们重新加入 id 列。

请注意,仅加入id 是不够的。这仍然会给你多行。您希望[temperature range 1] 的值与[Pressure Range 1][Pressure Range 1] 在同一行。因此,您必须将其添加到您的加入条件中。

在我的简化示例中,我使用列名中最右边的单个字符来构造这个额外的连接标准(给我12)。您的完整解决方案应该是列名中的最后两个数字(给您 112)。

简化样本数据

create table MyTable
(
  id int,
  A1 nvarchar(4),
  A2 nvarchar(4),
  B1 nvarchar(4),
  B2 nvarchar(4)
);

insert into MyTable (id, A1, A2, B1, B2) values
(1, '1A11', '1A21', '1B11', '1B21'),
(2, '2A11', '2A21', '2B11', '2B12');

简化解决方案

with cte_A as
(
  select upA.id, upA.AValue, upA.AType
  from (select id, A1, A2 from MyTable) a
  unpivot (AValue for AType in ([A1], [A2])) upA
),
cte_B as
(
  select upB.id, upB.BValue, upB.BType
  from (select id, B1, B2 from MyTable) b
  unpivot (BValue for BType in ([B1], [B2])) upB
)
select ca.id, right(ca.AType, 1) as Num, ca.AValue, ca.AType, cb.BValue, cb.BType
from cte_A ca
join cte_B cb
  on  cb.id = ca.id                            -- match on id
  and right(cb.BType, 1) = right(ca.AType, 1); -- match on type (bring A1 and B1 to same line)

样本输出

id Num AValue AType BValue BType
-- --- ------ ----- ------ -----
1  1   1A11   A1    1B11   B1
1  2   1A21   A2    1B21   B2
2  1   2A11   A1    2B11   B1
2  2   2A21   A2    2B12   B2

Fiddle


现有查询

回顾我的解决方案:您还可以将“额外连接标准”应用于现有查询。添加如下内容:

... 
where right(CalX.Temps, 2) = right(CalX.Pressures, 2)
  and right(CalX.Temps, 2) = right(CalX.CalciteXs, 2);

Fiddle 将此解决方案应用于您的查询的简化版本。

【讨论】:

以上是关于多个类似列的反透视问题的主要内容,如果未能解决你的问题,请参考以下文章

Python:一次包含重复值的多个列的 Pandas 数据透视表

PostgreSQL 中的反透视表

Access SQL 中的反透视

具有多个值列的数据透视表

Databricks/Spark SQL 中的反透视表

SQL Server 中的反透视表