Postgresql 内存表空间中的插入速度慢

Posted 2023-02-24

技术标签:

【中文标题】Postgresql 内存表空间中的插入速度慢【英文标题】：Slow insert speed in Postgresql memory tablespace 【发布时间】：2011-02-25 00:03:15 【问题描述】：

我有一个要求，我需要以 10,000 条记录/秒的速度将记录存储到数据库中（在几个字段上建立索引）。一条记录中的列数为 25。我正在一个事务块中批量插入 100,000 条记录。为了提高插入率，我将表空间从磁盘更改为 RAM。这样我每秒只能实现 5,000 次插入。

我还在 postgres 配置中做了以下调整：

索引：无 fsync : 假日志记录：已禁用

其他信息：

表空间：RAM 一行中的列数：25（大部分是整数） CPU：4 核，2.5 GHz 内存：48 GB

我想知道为什么当数据库没有在磁盘上写入任何内容时（因为我使用基于 RAM 的表空间），单个插入查询平均需要大约 0.2 毫秒。是不是我做错了什么？

帮助表示赞赏。

普拉尚特

【问题讨论】：

【参考方案1】：

快速数据加载

\COPY schema.temp_table FROM /tmp/data.csv WITH CSV

进一步的建议

对于大量数据：

SELECT

CLUSTER

创建唯一索引 measure_001_stc_index ONclimate.measurement_001 使用 btree (station_id, 采取, category_id); ALTER TABLEclimate.measurement_001 CLUSTER ON measure_001_stc_index;

配置设置

在具有 4GB RAM 的机器上，我做了以下...

内核配置

告诉内核程序可以使用共享内存块：

sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf

PostgreSQL 配置

/etc/postgresql/8.4/main/postgresql.conf

shared_buffers = 1GB
临时缓冲区 = 32MB
工作内存 = 32MB
维护工作内存 = 64MB
seq_page_cost = 1.0
random_page_cost = 2.0
cpu_index_tuple_cost = 0.001
有效缓存大小 = 512MB
checkpoint_segments = 10

子表

例如，假设您有基于天气的数据，分为不同的类别。与其拥有一张巨大的桌子，不如将它分成几张桌子（每个类别一张）。

主表

CREATE TABLE climate.measurement
(
  id bigserial NOT NULL,
  taken date NOT NULL,
  station_id integer NOT NULL,
  amount numeric(8,2) NOT NULL,
  flag character varying(1) NOT NULL,
  category_id smallint NOT NULL,
  CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

子表

CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001:  id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001:  taken date NOT NULL,
-- Inherited from table climate.measurement_001:  station_id integer NOT NULL,
-- Inherited from table climate.measurement_001:  amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001:  flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001:  category_id smallint NOT NULL,
  CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
  CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
  OIDS=FALSE
);

表格统计

增加重要列的表统计信息：

ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;

之后别忘了VACUUM 和ANALYSE。

【讨论】：

【参考方案2】：

你是在做一系列的插入

INSERT INTO tablename (...) VALUES (...);
INSERT INTO tablename (...) VALUES (...);
...

或作为一个多行插入：

INSERT INTO tablename (...) VALUES (...),(...),(...);

第二个在 100k 行上会明显更快。

来源：http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/

【讨论】：

我使用的是第一种方式： - BEGIN; - 插入表名 (...) 值 (...); - 插入表名 (...) 值 (...); - ... - 犯罪;我现在将尝试第二种方法。谢谢该帖子还表明 COPY 会更快【参考方案3】：

您是否也在您的 RAM 驱动器上放置了 xlog（WAL 段）？如果没有，您仍在写入磁盘。那么 wal_buffers、checkpoint_segments 等的设置呢？您必须尝试在 wal_buffers 中获取所有 100,000 条记录（您的单笔交易）。增加此参数可能会导致 PostgreSQL 请求的 System V 共享内存超出操作系统默认配置所允许的数量。

【讨论】：

是的，xlog 安装在 RAM 驱动器上。一行的大小约为 240 字节。因此，对于一批 100,000 条记录，我将 wal_buffer 大小设置为 250MB。使用这些设置，我每秒可以获得大约 6000-7000 次插入。有什么方法可以分析 postgres 以查看哪个操作需要时间。由于没有数据写入磁盘，因此内存传输应该相对非常快。每秒 6000 次插入 ~= 1.5 MB/s 我认为这非常慢。【参考方案4】：

我建议你使用COPY 而不是INSERT。

您还应该微调您的 postgresql.conf 文件。

阅读http://wiki.postgresql.org/wiki/Performance_Optimization

【讨论】：

以上是关于Postgresql 内存表空间中的插入速度慢的主要内容，如果未能解决你的问题，请参考以下文章