使用 Node.js/Sequelize 进行批量插入时 PostgreSQL 崩溃

Posted 2023-03-07

技术标签:

【中文标题】使用 Node.js/Sequelize 进行批量插入时 PostgreSQL 崩溃【英文标题】：PostgreSQL Crashes When Doing Bulk Inserts with Node.js/Sequelize 【发布时间】：2020-05-04 21:37:31 【问题描述】：

一个使用 Sequelize.js ORM 的 Node.js 应用程序正在对在 Mac OSX 主机系统上的 Docker 容器内运行的 PostgreSQL 11.2 服务器执行批量插入。每个批量插入通常由大约 1000-4000 行组成，批量插入并发数为 30，因此任何时候最多有 30 个活动插入操作。

const bulkInsert = async (payload) => 
    try 
        await sequelizeModelInstance.bulkCreate(payload);
     catch (e) 
        console.log(e);
    


pLimit = require('p-limit')(30);

(function() => 
    const promises = data.map(d => pLimit(() => bulkInsert(d))) // pLimit() controls Promise concurrency
    const result = await Promise.all(promises)
)();

一段时间后，PostgreSQL 服务器将开始报错Connection terminated unexpectedly，然后是the database system is in recovery mode。

在重复多次并检查我的日志后，似乎在执行一批 30 个批量插入时通常会发生此错误，其中几个批量插入每个包含超过 100,000 行。例如，当尝试进行 190000、650000 和 150000 行的 3 次批量插入以及每次 1000-4000 行的 27 次插入时，会发生一次特定的崩溃。

系统内存未满，CPU负载正常，磁盘空间充足。

问题： PostgreSQL 在这种情况下崩溃是否正常？如果是这样，我们可以调整 PostgreSQL 设置以允许更大的批量插入吗？如果是大批量插入的原因，Sequelize.js 有没有为我们拆分批量插入的功能？

在 Docker 容器中的 PostgreSQL 11.2、TimescaleDB 1.5.1、节点 v12.6.0、sequelize 5.21.3、Mac Catalina 10.15.2 上运行

发生问题后立即记录 PostgreSQL 日志

2020-01-18 00:58:26.094 UTC [1] LOG:  server process (PID 199) was terminated by signal 9
2020-01-18 00:58:26.094 UTC [1] DETAIL:  Failed process was running: INSERT INTO "foo" ("id","opId","unix","side","price","amount","b","s","serverTimestamp") VALUES (89880,'5007564','1579219200961','front','0.0000784','35','undefined','undefined','2020-01-17 00:00:01.038 +00:00'),.........
2020-01-18 00:58:26.108 UTC [1] LOG:  terminating any other active server processes
2020-01-18 00:58:26.110 UTC [220] WARNING:  terminating connection because of crash of another server process
2020-01-18 00:58:26.110 UTC [220] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-01-18 00:58:26.110 UTC [220] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-01-18 00:58:26.148 UTC [214] WARNING:  terminating connection because of crash of another server process
2020-01-18 00:58:26.148 UTC [214] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-01-18 00:58:26.148 UTC [214] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-01-18 00:58:26.149 UTC [203] WARNING:  terminating connection because of crash of another server process
2020-01-18 00:58:26.149 UTC [203] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

...

2020-01-18 00:58:30.098 UTC [1] LOG:  all server processes terminated; reinitializing
2020-01-18 00:58:30.240 UTC [223] FATAL:  the database system is in recovery mode
2020-01-18 00:58:30.241 UTC [222] LOG:  database system was interrupted; last known up at 2020-01-18 00:50:13 UTC
2020-01-18 00:58:30.864 UTC [224] FATAL:  the database system is in recovery mode
2020-01-18 00:58:31.604 UTC [225] FATAL:  the database system is in recovery mode
2020-01-18 00:58:32.297 UTC [226] FATAL:  the database system is in recovery mode
2020-01-18 00:58:32.894 UTC [227] FATAL:  the database system is in recovery mode
2020-01-18 00:58:33.394 UTC [228] FATAL:  the database system is in recovery mode
2020-01-18 01:00:55.911 UTC [222] LOG:  database system was not properly shut down; automatic recovery in progress
2020-01-18 01:00:56.856 UTC [222] LOG:  redo starts at 0/197C610
2020-01-18 01:01:55.662 UTC [229] FATAL:  the database system is in recovery mode

【问题讨论】：

Postgresql 似乎被外力杀死了，可能是你操作系统的OOM Killer，你试过禁用它吗？（你提到了Docker，也可能是run镜像里面的OOMK）。 @NickLeBlanc 对于 Docker OOMK，我们应该使用 --oom-kill-disable=true 标志还是使用 --memory=4g 将 Docker 容器内存限制从 2 GB 增加到 4 GB？ @NickLeBlanc 似乎找不到任何关于 Mac 的 OOMK 的信息，仅适用于 Linux。 "系统内存未满" 你怎么知道的？你用了什么工具？它可能会很快从未满到满再到未满。在不到一个典型的监测间隔内。为什么用“mysql”标记？ @jjanes 你说得对，我无法确定应用/数据库的内存使用峰值是否会导致系统的内存使用被充分利用并导致OOM场景。我已将 mysql 标记替换为 docker. 【参考方案1】：

您的 Postgresql 服务器可能被 Docker 操作系统的 OOM Killer（Out of Memory Killer）杀死。

你可以：

增加 Postgres 可用的内存，2GB 对于您正在运行的操作量来说是一个较低的值。减少批量插入大小并限制它们的并发性。调整 Postgres 安装以适合您的硬件： shared_buffers：recommended here 应该是系统内存的 25%，这是建议，您应该始终对您的方案进行基准测试和评估，并选择适合您环境的值。 work_mem：正如 here 解释的那样：

这个大小适用于每个用户所做的每一个排序，并且复杂查询可以使用多个工作内存排序缓冲区。设置它到 50MB，并有 30 个用户提交查询，您很快就会使用 1.5GB 真实内存。此外，如果查询涉及对 8 个表进行合并排序，则需要 8 次 work_mem。你需要考虑您将 max_connections 设置为什么以调整其大小参数正确。这是数据仓库系统的设置，在用户提交非常大的查询的地方，可以很容易地利用许多 GB 的内存。

您可以在 Postgres 上更改很多配置以提高性能和稳定性。 Postgres 在默认设置下运行良好，但在生产环境或重负载环境中，您将不可避免地需要对其进行调整。

推荐读物：

Tuning Your PostgreSQL System PostgreSQL Documentation: Resource Consumption Configuring Memory on PostgreSQL

【讨论】：

【参考方案2】：

我在运行迁移时遇到了类似的问题，但该解决方案可以应用于此问题。

这个想法是将您的有效负载拼接成可管理的块。就我而言，一次 100 条记录似乎是可以管理的。

const payload = require("./seeds/big-mama.json"); //around 715.000 records

module.exports = 
    up: (queryInterface) => 
        const records = payload.map(function (record) 
            record.createdAt = new Date();
            record.updatedAt = new Date();
            return record;
        );

        let lastQuery;
        while (records.length > 0) 
            lastQuery = queryInterface.bulkInsert(
                "Products",
                records.splice(0, 100),
                
            );
        

        return lastQuery;
    ,

    down: (queryInterface) => 
        return queryInterface.bulkDelete("Products", null, );
    
;

【讨论】：

这对我有用！除了我添加了一个await lastQuery，这样查询将按顺序执行，而不会超时或压倒postgres。此外，最好对所有查询执行“等待全部”或其他操作，但这不是必需的

以上是关于使用 Node.js/Sequelize 进行批量插入时 PostgreSQL 崩溃的主要内容，如果未能解决你的问题，请参考以下文章