SQL转置和添加列
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了SQL转置和添加列相关的知识,希望对你有一定的参考价值。
我有一个具有以下结构(标题)的表:
ProjID,Cost2001,Cost2002,Cost2003
例如,前两行看起来像这样:
projectA,10,32,30
projectB,42,22,122
我想将此表转换为以下结构(标题):
ProjID,CostYear,Value
因此回到示例数据,发布转换,它看起来像这样:
ProjectA,Cost2001,10
ProjectA,Cost2002,32
ProjectA,Cost2003,30
ProjectB,Cost2001,42
ProjectB,Cost2002,22
ProjectB,Cost2003,122
我该怎么做呢?我正在使用支持Standard SQL的Google BigQuery。我只需要这样做一次来修复表,所以我不介意将数据导入另一个RDBMS以便能够使用数据透视功能。
答案
恕我直言,最简单的方法是使用UNION ALL。
CREATE TABLE projects(ProjID VARCHAR(20), Cost2001 int, Cost2002 int, Cost2003 int); INSERT INTO projects VALUES ('projectA', 10, 32, 30), ('projectB', 42, 22, 122); CREATE TABLE new_projects (ProjID VARCHAR(20), ProjYear INT, Cost int); GO
2 rows affected
INSERT INTO new_projects SELECT ProjID, 2001, Cost2001 FROM projects UNION ALL SELECT ProjID, 2002, Cost2002 FROM projects UNION ALL SELECT ProjID, 2003, Cost2003 FROM projects; SELECT * FROM new_projects; GO
ProjID | ProjYear | Cost :------- | -------: | ---: projectA | 2001 | 10 projectB | 2001 | 42 projectA | 2002 | 32 projectB | 2002 | 22 projectA | 2003 | 30 projectB | 2003 | 122
dbfiddle here
另一答案
以下是真正的BigQuery风格:o)
这两个版本都适用于BigQuery Standard SQL
##standardSQL
SELECT
projID,
([2001, 2002, 2003])[SAFE_OFFSET(pos)] year,
cost
FROM `project.dataset.table`,
UNNEST([Cost2001,Cost2002,Cost2003]) cost WITH OFFSET pos
您可以使用您问题中的虚拟数据测试/播放上面的智慧,如下所示
##standardSQL
WITH `project.dataset.table` AS (
SELECT 'projectA' projID, 10 Cost2001, 32 Cost2002, 30 Cost2003 UNION ALL
SELECT 'projectB', 42, 22, 122
)
SELECT
projID,
([2001, 2002, 2003])[SAFE_OFFSET(pos)] year,
cost
FROM `project.dataset.table`,
UNNEST([Cost2001,Cost2002,Cost2003]) cost WITH OFFSET pos
结果为
Row projID year cost
1 projectA 2001 10
2 projectA 2002 32
3 projectA 2003 30
4 projectB 2001 42
5 projectB 2002 22
6 projectB 2003 122
正如您在上面的查询中所看到的,您必须在下面的行中预先设置各年的值
([2001, 2002, 2003])[SAFE_OFFSET(pos)] year
如果出于某种原因你想要更通用并且能够从原始列的名称中获得这些值 - 你可以使用以下通用方法
##standardSQL
WITH `project.dataset.table` AS (
SELECT 'projectA' projID, 10 Cost2001, 32 Cost2002, 30 Cost2003 UNION ALL
SELECT 'projectB', 42, 22, 122
)
SELECT
projID,
SPLIT(x,':')[SAFE_OFFSET(0)] year,
SPLIT(x,':')[SAFE_OFFSET(1)] cost
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', ''))) x
WHERE SPLIT(x,':')[OFFSET(0)] != 'projID'
显然,结果相同
Row projID year cost
1 projectA Cost2001 10
2 projectA Cost2002 32
3 projectA Cost2003 30
4 projectB Cost2001 42
5 projectB Cost2002 22
6 projectB Cost2003 122
另一答案
正如大家所提到的,你需要UNPIVOT
,这看起来像:
DECLARE @projects TABLE (projid nvarchar(max), cost2001 int, cost2002 int, cost2003 int);
INSERT @projects VALUES ('projectA', 10, 32, 30)
, ('projectB', 42, 22, 122);
SELECT PROJID, PROJECT_ATTRIBUTE, PROJECT_COST
FROM @projects
UNPIVOT (PROJECT_COST FOR PROJECT_ATTRIBUTE in (cost2001, cost2002, cost2003) ) AS UNPVT
出于性能原因,我不打算使用UNION ALL
版本。基本上你会扫描表3次或者你有多少“CostYear”列,而且你必须为它添加一个全新的查询。
而不是用UNPIVOT
扫描表一次。
另一答案
取消您的数据
DECLARE @ProjectTbl TABLE (ProjectID VARCHAR(225),Cost2001 INT, Cost2002 INT,Cost2003 INT)
INSERT INTO @ProjectTbl VALUES
('projectA',10,32,30),
('projectB',42,22,122);
;WITH Unpivots
AS
(SELECT
*
FROM @ProjectTbl
UNPIVOT
(
Value FOR CostYear IN (Cost2001, Cost2002, Cost2003)
) AS up
)
SELECT
ProjectID,
CostYear,
Value
FROM Unpivots
产量
ProjectID CostYear Value
projectA Cost2001 10
projectA Cost2002 32
projectA Cost2003 30
projectB Cost2001 42
projectB Cost2002 22
projectB Cost2003 122
以上是关于SQL转置和添加列的主要内容,如果未能解决你的问题,请参考以下文章