Postgres- pgsql 花费更多时间从超过 15 亿行的表中检索数据
Posted
技术标签:
【中文标题】Postgres- pgsql 花费更多时间从超过 15 亿行的表中检索数据【英文标题】:Postgres- pgsql taking more time to retrieve data from table with more than 1.5 billion rows 【发布时间】:2017-07-18 19:03:28 【问题描述】:如何优化表或查询以下 pgsql 查询(需要 34 分钟才能获得 770 条记录)?已经为几列添加了索引。不知道还有什么可以做这个查询
查询:
SELECT
min(p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles') as Date,
'America/Los_Angeles' AS Timezone,
sum(GREATEST(0, p.value)) as Value,
p.uom as UnitOfMeasurement
FROM
pv.bsa_vessel_vs p
WHERE
p.start_timestamp AT TIME ZONE p.timezone >= '2017-01-01'
and p.start_timestamp AT TIME ZONE p.timezone < '2017-02-01'
and p.vessel_serial_number ='U57625059'
GROUP BY
date_trunc('hour', p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles'), p.uom
ORDER BY
Date ;
表:
CREATE TABLE pv.bsa_vessel_vs
(
bsa_vessel_vs_id bigserial NOT NULL,
data_source_id bigint NOT NULL,
start_timestamp timestamp without time zone NOT NULL,
end_timestamp timestamp without time zone NOT NULL,
value numeric(12,4) NOT NULL,
uom text NOT NULL,
timezone text NOT NULL,
created_timestamp timestamp without time zone DEFAULT now(),
updated_timestamp timestamp without time zone DEFAULT now(),
vessel_serial_number text NOT NULL,
CONSTRAINT bsa_vessel_vs_pkey PRIMARY KEY (bsa_vessel_vs_id),
CONSTRAINT bsa_vessel_vs_data_source_id_fkey FOREIGN KEY (data_source_id)
REFERENCES pv.data_source (data_source_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT
)
WITH (
OIDS=FALSE
);
CREATE INDEX pm_start_timestamp_ndex
ON pv.bsa_vessel_vs
USING btree
(start_timestamp DESC NULLS LAST);
CREATE INDEX bsa_vessel_vs_meter_ts_idx
ON pv.bsa_vessel_vs
USING btree
(vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp);
CREATE UNIQUE INDEX bsa_vessel_vs_u_idx
ON pv.bsa_vessel_vs
USING btree
(data_source_id, vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp DESC);
谢谢 卡西
【问题讨论】:
【参考方案1】:更改您的索引,使其包含您在WHERE
子句中使用的相同表达式,即:
CREATE INDEX bsa_vessel_vs_meter_ts_2_idx
ON bsa_vessel_vs
USING btree
( vessel_serial_number COLLATE pg_catalog."default",
(start_timestamp AT TIME ZONE timezone),
(start_timestamp AT TIME ZONE timezone)
);
当您定义该索引时,您将获得一个使用它的执行计划:
|查询计划 | | :------------------------------------------------ -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ------------ | |排序(成本=69.60..69.70 行=39 宽度=83)| |排序键: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))) | | -> HashAggregate(成本=67.79..68.57 行=39 宽度=83)| |组键: date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp))), uom | | -> 在 bsa_vessel_vs p 上使用 bsa_vessel_vs_meter_ts_2_idx 进行索引扫描(成本=0.28..67.20 行=39 宽度=44)| |索引条件:((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone , start_timestamp)然而,如果索引不在那里,PostgreSQL 会使用全表扫描:
|查询计划 | | :------------------------------------------------ -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ------------- | |排序(成本=298.84..298.94 行=39 宽度=83)| |排序键: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))) | | -> GroupAggregate(成本=296.35..297.81 行=39 宽度=83)| |组键:(date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom | | -> 排序(成本=296.35..296.45 行=39 宽度=44)| |排序键:(date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom | | -> bsa_vessel_vs p 上的 Seq 扫描(成本=0.00..295.32 行=39 宽度=44)| |过滤器:((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone, start_timestamp)您可以在 dbfiddle here
查看所有设置【讨论】:
非常感谢乔阿诺洛!我只有数据库的读取权限。将很快更新。感谢您的快速回复!以上是关于Postgres- pgsql 花费更多时间从超过 15 亿行的表中检索数据的主要内容,如果未能解决你的问题,请参考以下文章
Postgres PL/pgSQL,可以声明匿名自定义类型吗?
/usr/pgsql-9.3/share/extension 中不存在 Postgres plpythonu 扩展
与 postgres 和套接字“/var/run/postgresql/.s.PGSQL.5432”相关的引擎场错误
如何使用窗口函数仅在 POSTGRES 中选择不超过某个值的行