在 Rails 中批量插入时出错

Posted

技术标签:

【中文标题】在 Rails 中批量插入时出错【英文标题】:Error while bulk inserting in Rails 【发布时间】:2017-06-29 14:32:57 【问题描述】:

我正在尝试从不同的 mysql 数据库在 Rails ActiveRecord 中进行批量插入。我的数据库是 Postgres。

使用以下代码和 gem bulk_insert:

batch,batch_size = [], 1_000
records.each do |row|
    batch << params
    if batch.size >= batch_size
        TableName.bulk_insert values: batch
        batch = []
    end
end

但是我在尝试这样做时遇到了错误。前 1000 条记录插入正常。之后我得到以下信息:

from /var/lib/gems/2.3.0/gems/activerecord-5.0.3/lib/active_record/connection_adapters/postgresql/database_statements.rb:98:in `async_exec'
from /var/lib/gems/2.3.0/gems/activerecord-5.0.3/lib/active_record/connection_adapters/postgresql/database_statements.rb:98:in `block in execute'
from /var/lib/gems/2.3.0/gems/activerecord-5.0.3/lib/active_record/connection_adapters/abstract_adapter.rb:590:in `block in log'
from /var/lib/gems/2.3.0/gems/activesupport-5.0.3/lib/active_support/notifications/instrumenter.rb:21:in `instrument'
from /var/lib/gems/2.3.0/gems/activerecord-5.0.3/lib/active_record/connection_adapters/abstract_adapter.rb:583:in `log'
from /var/lib/gems/2.3.0/gems/activerecord-5.0.3/lib/active_record/connection_adapters/postgresql/database_statements.rb:97:in `execute'
from /var/lib/gems/2.3.0/gems/bulk_insert-1.5.0/lib/bulk_insert/worker.rb:78:in `block in save!'
from /var/lib/gems/2.3.0/gems/bulk_insert-1.5.0/lib/bulk_insert/worker.rb:78:in `tap'
from /var/lib/gems/2.3.0/gems/bulk_insert-1.5.0/lib/bulk_insert/worker.rb:78:in `save!'
from /var/lib/gems/2.3.0/gems/bulk_insert-1.5.0/lib/bulk_insert/worker.rb:40:in `add'
from /var/lib/gems/2.3.0/gems/bulk_insert-1.5.0/lib/bulk_insert/worker.rb:63:in `block in add_all'
from /var/lib/gems/2.3.0/gems/bulk_insert-1.5.0/lib/bulk_insert/worker.rb:63:in `each'
from /var/lib/gems/2.3.0/gems/bulk_insert-1.5.0/lib/bulk_insert/worker.rb:63:in `add_all'
from /var/lib/gems/2.3.0/gems/bulk_insert-1.5.0/lib/bulk_insert.rb:13:in `block in bulk_insert'
from /var/lib/gems/2.3.0/gems/activerecord-5.0.3/lib/active_record/connection_adapters/abstract/database_statements.rb:232:in `block in transaction'
from /var/lib/gems/2.3.0/gems/activerecord-5.0.3/lib/active_record/connection_adapters/abstract/transaction.rb:189:in `within_new_transaction'
... 22 levels...
from /var/lib/gems/2.3.0/gems/railties-5.0.3/lib/rails/commands/console_helper.rb:9:in `start'
from /var/lib/gems/2.3.0/gems/railties-5.0.3/lib/rails/commands/commands_tasks.rb:78:in `console'
from /var/lib/gems/2.3.0/gems/railties-5.0.3/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
from /var/lib/gems/2.3.0/gems/railties-5.0.3/lib/rails/commands.rb:18:in `<top (required)>'
from /var/lib/gems/2.3.0/gems/activesupport-5.0.3/lib/active_support/dependencies.rb:293:in `require'
from /var/lib/gems/2.3.0/gems/activesupport-5.0.3/lib/active_support/dependencies.rb:293:in `block in require'
from /var/lib/gems/2.3.0/gems/activesupport-5.0.3/lib/active_support/dependencies.rb:259:in `load_dependency'
from /var/lib/gems/2.3.0/gems/activesupport-5.0.3/lib/active_support/dependencies.rb:293:in `require'
from /home/user/rails-app/bin/rails:9:in `<top (required)>'
from /var/lib/gems/2.3.0/gems/activesupport-5.0.3/lib/active_support/dependencies.rb:287:in `load'
from /var/lib/gems/2.3.0/gems/activesupport-5.0.3/lib/active_support/dependencies.rb:287:in `block in load'
from /var/lib/gems/2.3.0/gems/activesupport-5.0.3/lib/active_support/dependencies.rb:259:in `load_dependency'
from /var/lib/gems/2.3.0/gems/activesupport-5.0.3/lib/active_support/dependencies.rb:287:in `load'
from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from -e:1:in `<main>'

记录的大小约为 100,000。在 Rails 表中插入这些记录的最佳方法是什么?

我是不是做错了什么?

【问题讨论】:

在 Postgresql 中批量插入的最佳方法是使用其 COPY 命令 - 从 csv 文件甚至通过连接:kadrmasconcepts.com/blog/2013/12/15/… 我必须从另一个数据库中获取数据,然后插入到 Rails 表中。实现这一目标的最佳方法是什么? 您必须从该信息开始您的问题。 ***.com/questions/36476192/… 我已经编辑了我的问题以添加该信息。然而,我的主要动机是加快 Postgres 表中的插入速度。 好的,你可以用谷歌工具将数据从 MySQL 迁移到 PostgreSQL。至于只是加快批量插入 - COPY 是大赢家。 【参考方案1】:

我猜你可以使用 ActiveRecord 导入库 请阅读this

还有这个bulk_insert

【讨论】:

使用了 bulk_insert,但我需要的是 find_or_create_by 方法。这可以通过 bulk_insert 完成吗?目前我正在运行 bulk_insert 然后删除重复的记录。这肯定不是正确的做法。

以上是关于在 Rails 中批量插入时出错的主要内容,如果未能解决你的问题,请参考以下文章

如何识别 Rails 6 批量插入错误

Oracle批量插入数据SQL语句太长出错:无效的主机/绑定变量名

批量插入数据

在实体框架中批量插入后批量插入记录并获取它们的 ID

使用 C# 从 Excel 2007 批量上传 SQL 服务器时出错

MySql批量插入时,如何不插入重复的数据