使用 PostgreSQL 的数据透视视图
Posted
技术标签:
【中文标题】使用 PostgreSQL 的数据透视视图【英文标题】:PIVOT VIEW using PostgreSQL 【发布时间】:2015-01-04 23:08:55 【问题描述】:我是 PostgreSQL 新手,正在使用 9.4 版。我有一个表,其中收集了作为 字符串 的测量值,需要使用始终保持最新的东西(例如 VIEW)将其转换为一种 PIVOT 表。 此外,一些值需要转换,例如。 G。乘以 1000,就像你 可以在下面的示例中看到“sensor3”。
源表:
CREATE TABLE source (
id bigint NOT NULL,
name character varying(255),
"timestamp" timestamp without time zone,
value character varying(32672),
CONSTRAINT source_pkey PRIMARY KEY (id)
);
INSERT INTO source VALUES
(15,'sensor2','2015-01-03 22:02:05.872','88.4')
, (16,'foo27' ,'2015-01-03 22:02:10.887','-3.755')
, (17,'sensor1','2015-01-03 22:02:10.887','1.1704')
, (18,'foo27' ,'2015-01-03 22:02:50.825','-1.4')
, (19,'bar_18' ,'2015-01-03 22:02:50.833','545.43')
, (20,'foo27' ,'2015-01-03 22:02:50.935','-2.87')
, (21,'sensor3','2015-01-03 22:02:51.044','6.56');
源表结果:
| id | name | timestamp | value |
|----+-----------+---------------------------+----------|
| 15 | "sensor2" | "2015-01-03 22:02:05.872" | "88.4" |
| 16 | "foo27" | "2015-01-03 22:02:10.887" | "-3.755" |
| 17 | "sensor1" | "2015-01-03 22:02:10.887" | "1.1704" |
| 18 | "foo27" | "2015-01-03 22:02:50.825" | "-1.4" |
| 19 | "bar_18" | "2015-01-03 22:02:50.833" | "545.43" |
| 20 | "foo27" | "2015-01-03 22:02:50.935" | "-2.87" |
| 21 | "sensor3" | "2015-01-03 22:02:51.044" | "6.56" |
期望的最终结果:
| timestamp | sensor1 | sensor2 | sensor3 | foo27 | bar_18 |
|---------------------------+---------+---------+---------+---------+---------|
| "2015-01-03 22:02:05.872" | | 88.4 | | | |
| "2015-01-03 22:02:10.887" | 1.1704 | | | -3.755 | |
| "2015-01-03 22:02:50.825" | | | | -1.4 | |
| "2015-01-03 22:02:50.833" | | | | | 545.43 |
| "2015-01-03 22:02:50.935" | | | | -2.87 | |
| "2015-01-03 22:02:51.044" | | | 6560.00 | | |
使用这个:
-- CREATE EXTENSION tablefunc;
SELECT *
FROM
crosstab(
'SELECT
source."timestamp",
source.name,
source.value
FROM
public.source
ORDER BY
1'
,
'SELECT
DISTINCT
source.name
FROM
public.source
ORDER BY
1'
)
AS
(
"timestamp" timestamp without time zone,
"sensor1" character varying(32672),
"sensor2" character varying(32672),
"sensor3" character varying(32672),
"foo27" character varying(32672),
"bar_18" character varying(32672)
)
;
我得到了结果:
| timestamp | sensor1 | sensor2 | sensor3 | foo27 | bar_18 |
|---------------------------+---------+---------+---------+---------+---------|
| "2015-01-03 22:02:05.872" | | | | 88.4 | |
| "2015-01-03 22:02:10.887" | | -3.755 | 1.1704 | | |
| "2015-01-03 22:02:50.825" | | -1.4 | | | |
| "2015-01-03 22:02:50.833" | 545.43 | | | | |
| "2015-01-03 22:02:50.935" | | -2.87 | | | |
| "2015-01-03 22:02:51.044" | | | | | 6.56 |
很遗憾,
-
值未分配给正确的列,
列不是动态的;这意味着当名称列中存在诸如“sensor4”之类的附加条目时查询将失败,并且
我不知道如何更改某些列的值(乘)。
【问题讨论】:
为什么varchar(32672)
为什么不是float
或numeric
?
您需要另一个表,它的名称与源表匹配,并且具有比例因子和所需的列顺序,请删除“as”子句。
@Jasen:不是这里发明的!
哇!曾叱咤风云的。你还在用 9.4 吗?
【参考方案1】:
您的查询是这样的:
SELECT * FROM crosstab(
$$SELECT "timestamp", name
, CASE name
WHEN 'sensor3' THEN value::numeric * 1000
-- WHEN 'sensor9' THEN value::numeric * 9000 -- add more ...
ELSE value::numeric END AS value
FROM source
ORDER BY 1, 2$$
,$$SELECT unnest('bar_18,foo27,sensor1,sensor2,sensor3'::text[])$$
) AS (
"timestamp" timestamp
, bar_18 numeric
, foo27 numeric
, sensor1 numeric
, sensor2 numeric
, sensor3 numeric);
要将value
与选定列相乘,请使用"simple" CASE
语句。但是您需要先转换为numeric type。在示例中使用value::numeric
。
这就引出了一个问题:为什么不将值存储为数字类型?
您需要使用带有两个参数的版本。详细解释:
PostgreSQL Crosstab Query真正动态的交叉表几乎是不可能的,因为 SQL 要求提前知道结果类型 - 最迟在调用时。但是你可以用多态类型来做一些事情:
Dynamic alternative to pivot with CASE and GROUP BY【讨论】:
【参考方案2】:@Erwin:评论说“太长了 7128 个字符”!无论如何:
你的帖子给了我正确方向的提示,非常感谢你, 但特别是在我的情况下,我需要它是真正动态的。目前我有 38886 行,包含 49 个不同的项目(= 要旋转的列)。
首先回答您和@Jasen 的紧急问题: 源表布局不由我决定,我已经很高兴得到这个 数据到 RDBMS。如果是我,我会一直保存 UTC 时间戳!但 将数据保存为字符串还有一个原因:它可能包含 各种数据类型,如布尔、整数、浮点数、字符串等。
为了避免进一步混淆,我创建了一个新的演示数据集,为数据添加前缀 输入(我知道有些人讨厌这个!)以避免关键字出现问题并更改 时间戳(--> 分钟)以获得更好的概览:
-- --------------------------------------------------------------------------
-- Create demo table of given schema and insert arbitrary data
-- --------------------------------------------------------------------------
DROP TABLE IF EXISTS table_source;
CREATE TABLE table_source
(
column_id BIGINT NOT NULL,
column_name CHARACTER VARYING(255),
column_timestamp TIMESTAMP WITHOUT TIME ZONE,
column_value CHARACTER VARYING(32672),
CONSTRAINT table_source_pkey PRIMARY KEY (column_id)
);
INSERT INTO table_source VALUES ( 15,'sensor2','2015-01-03 22:01:05.872','88.4');
INSERT INTO table_source VALUES ( 16,'foo27' ,'2015-01-03 22:02:10.887','-3.755');
INSERT INTO table_source VALUES ( 17,'sensor1','2015-01-03 22:02:10.887','1.1704');
INSERT INTO table_source VALUES ( 18,'foo27' ,'2015-01-03 22:03:50.825','-1.4');
INSERT INTO table_source VALUES ( 19,'bar_18','2015-01-03 22:04:50.833','545.43');
INSERT INTO table_source VALUES ( 20,'foo27' ,'2015-01-03 22:05:50.935','-2.87');
INSERT INTO table_source VALUES ( 21,'seNSor3','2015-01-03 22:06:51.044','6.56');
SELECT * FROM table_source;
此外,根据@Erwin 的建议,我创建了一个视图,该视图已经 转换数据类型。除了速度快之外,它还有一个不错的功能 为已知项目添加所需的转换,但不影响其他(新) 项目。
-- --------------------------------------------------------------------------
-- Create view to process source data
-- --------------------------------------------------------------------------
DROP VIEW IF EXISTS view_source_processed;
CREATE VIEW
view_source_processed
AS
SELECT
column_timestamp,
column_name,
CASE LOWER( column_name)
WHEN LOWER( 'sensor3') THEN CAST( column_value AS DOUBLE PRECISION) * 1000.0
ELSE CAST( column_value AS DOUBLE PRECISION)
END AS column_value
FROM
table_source
;
SELECT * FROM view_source_processed ORDER BY column_timestamp DESC LIMIT 100;
这是整个问题的预期结果:
-- --------------------------------------------------------------------------
-- Desired result:
-- --------------------------------------------------------------------------
/*
| column_timestamp | bar_18 | foo27 | sensor1 | sensor2 | seNSor3 |
|---------------------------+---------+---------+---------+---------+---------|
| "2015-01-03 22:01:05.872" | | | | 88.4 | |
| "2015-01-03 22:02:10.887" | | -3.755 | 1.1704 | | |
| "2015-01-03 22:03:50.825" | | -1.4 | | | |
| "2015-01-03 22:04:50.833" | 545.43 | | | | |
| "2015-01-03 22:05:50.935" | | -2.87 | | | |
| "2015-01-03 22:06:51.044" | | | | | 6560 |
*/
这是@Erwin 的解决方案,被新的演示源数据采用。这是完美的, 只要项目(= 要旋转的列)不变:
-- --------------------------------------------------------------------------
-- Solution by Erwin, modified for changed demo dataset:
-- http://***.com/a/27773730
-- --------------------------------------------------------------------------
SELECT *
FROM
crosstab(
$$
SELECT
column_timestamp,
column_name,
column_value
FROM
view_source_processed
ORDER BY
1, 2
$$
,
$$
SELECT
UNNEST( 'bar_18,foo27,sensor1,sensor2,seNSor3'::text[])
$$
)
AS
(
column_timestamp timestamp,
bar_18 DOUBLE PRECISION,
foo27 DOUBLE PRECISION,
sensor1 DOUBLE PRECISION,
sensor2 DOUBLE PRECISION,
seNSor3 DOUBLE PRECISION
)
;
在阅读@Erwin 提供的链接时,我发现了一个动态 SQL 示例 @Clodoaldo Neto 并记得,我已经这样做了 事务处理-SQL;这是我的尝试:
-- --------------------------------------------------------------------------
-- Dynamic attempt based on:
-- http://***.com/a/12989297/131874
-- --------------------------------------------------------------------------
DO $DO$
DECLARE
list_columns TEXT;
BEGIN
DROP TABLE IF EXISTS temp_table_pivot;
list_columns := (
SELECT
string_agg( DISTINCT column_name, ' ' ORDER BY column_name)
FROM
view_source_processed
);
EXECUTE(
FORMAT(
$format_1$
CREATE TEMP TABLE
temp_table_pivot(
column_timestamp TIMESTAMP,
%1$s
)
$format_1$
,
(
REPLACE(
list_columns,
' ',
' DOUBLE PRECISION, '
) || ' DOUBLE PRECISION'
)
)
);
EXECUTE(
FORMAT(
$format_2$
INSERT INTO temp_table_pivot
SELECT
*
FROM crosstab(
$crosstab_1$
SELECT
column_timestamp,
column_name,
column_value
FROM
view_source_processed
ORDER BY
column_timestamp, column_name
$crosstab_1$
,
$crosstab_2$
SELECT DISTINCT
column_name
FROM
view_source_processed
ORDER BY
column_name
$crosstab_2$
)
AS
(
column_timestamp TIMESTAMP,
%1$s
);
$format_2$
,
REPLACE( list_columns, ' ', ' DOUBLE PRECISION, ')
||
' DOUBLE PRECISION'
)
);
END;
$DO$;
SELECT * FROM temp_table_pivot ORDER BY column_timestamp DESC LIMIT 100;
除了将其放入存储过程之外,出于性能原因,我还会: 尝试将此应用于仅插入新值的中间表。 我会及时通知您!
谢谢!!!
L.
PS:不,我不想回答我自己的问题,但是“评论”字段太小了!
【讨论】:
以上是关于使用 PostgreSQL 的数据透视视图的主要内容,如果未能解决你的问题,请参考以下文章