Postgres- pgsql 花费更多时间从超过 15 亿行的表中检索数据

Posted

技术标签:

【中文标题】Postgres- pgsql 花费更多时间从超过 15 亿行的表中检索数据【英文标题】:Postgres- pgsql taking more time to retrieve data from table with more than 1.5 billion rows 【发布时间】:2017-07-18 19:03:28 【问题描述】:

如何优化表或查询以下 pgsql 查询(需要 34 分钟才能获得 770 条记录)?已经为几列添加了索引。不知道还有什么可以做这个查询

查询:

SELECT 
    min(p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles') as Date, 
    'America/Los_Angeles' AS Timezone, 
    sum(GREATEST(0, p.value)) as Value, 
    p.uom as UnitOfMeasurement
FROM
    pv.bsa_vessel_vs p                                 
WHERE
        p.start_timestamp AT TIME ZONE p.timezone >= '2017-01-01'
    and p.start_timestamp AT TIME ZONE p.timezone <  '2017-02-01'
    and p.vessel_serial_number ='U57625059'
GROUP BY
    date_trunc('hour', p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles'), p.uom   
ORDER BY
    Date ;

表:

CREATE TABLE pv.bsa_vessel_vs
(
  bsa_vessel_vs_id bigserial NOT NULL,
  data_source_id bigint NOT NULL,
  start_timestamp timestamp without time zone NOT NULL,
  end_timestamp timestamp without time zone NOT NULL,
  value numeric(12,4) NOT NULL,
  uom text NOT NULL,
  timezone text NOT NULL,
  created_timestamp timestamp without time zone DEFAULT now(),
  updated_timestamp timestamp without time zone DEFAULT now(),
  vessel_serial_number text NOT NULL,
  CONSTRAINT bsa_vessel_vs_pkey PRIMARY KEY (bsa_vessel_vs_id),
  CONSTRAINT bsa_vessel_vs_data_source_id_fkey FOREIGN KEY (data_source_id)
      REFERENCES pv.data_source (data_source_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE RESTRICT
)
WITH (
  OIDS=FALSE
);

CREATE INDEX pm_start_timestamp_ndex
  ON pv.bsa_vessel_vs
  USING btree
  (start_timestamp DESC NULLS LAST);

CREATE INDEX bsa_vessel_vs_meter_ts_idx
  ON pv.bsa_vessel_vs
  USING btree
  (vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp);


CREATE UNIQUE INDEX bsa_vessel_vs_u_idx
  ON pv.bsa_vessel_vs
  USING btree
  (data_source_id, vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp DESC);

谢谢 卡西

【问题讨论】:

【参考方案1】:

更改您的索引,使其包含您在WHERE 子句中使用的相同表达式,即:

CREATE INDEX bsa_vessel_vs_meter_ts_2_idx
  ON bsa_vessel_vs
  USING btree
  ( vessel_serial_number COLLATE pg_catalog."default", 
    (start_timestamp AT TIME ZONE timezone), 
    (start_timestamp AT TIME ZONE timezone)
  );

当您定义该索引时,您将获得一个使用它的执行计划:

|查询计划 | | :------------------------------------------------ -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ------------ | |排序(成本=69.60..69.70 行=39 宽度=83)| |排序键: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))) | | -> HashAggregate(成本=67.79..68.57 行=39 宽度=83)| |组键: date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp))), uom | | -> 在 bsa_vessel_vs p 上使用 bsa_vessel_vs_meter_ts_2_idx 进行索引扫描(成本=0.28..67.20 行=39 宽度=44)| |索引条件:((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone , start_timestamp)

然而,如果索引在那里,PostgreSQL 会使用全表扫描:

|查询计划 | | :------------------------------------------------ -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ------------- | |排序(成本=298.84..298.94 行=39 宽度=83)| |排序键: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))) | | -> GroupAggregate(成本=296.35..297.81 行=39 宽度=83)| |组键:(date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom | | -> 排序(成本=296.35..296.45 行=39 宽度=44)| |排序键:(date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom | | -> bsa_vessel_vs p 上的 Seq 扫描(成本=0.00..295.32 行=39 宽度=44)| |过滤器:((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone, start_timestamp)

您可以在 dbfiddle here

查看所有设置

【讨论】:

非常感谢乔阿诺洛!我只有数据库的读取权限。将很快更新。感谢您的快速回复!

以上是关于Postgres- pgsql 花费更多时间从超过 15 亿行的表中检索数据的主要内容,如果未能解决你的问题,请参考以下文章

Postgres PL/pgSQL,可以声明匿名自定义类型吗?

/usr/pgsql-9.3/share/extension 中不存在 Postgres plpythonu 扩展

与 postgres 和套接字“/var/run/postgresql/.s.PGSQL.5432”相关的引擎场错误

如何使用窗口函数仅在 POSTGRES 中选择不超过某个值的行

在 Pl/pgSQL 中使用 FOR 循环时,它在 Postgres 11.8 中不起作用

docker安装pgsql