Eclipselink 和 Postgresql 批量编写
Posted
技术标签:
【中文标题】Eclipselink 和 Postgresql 批量编写【英文标题】:Eclipselink and Postgresql batch writing 【发布时间】:2016-02-26 14:40:47 【问题描述】:我一直在为我的一位客户开发 BulkSMS 解决方案,我决定使用 JPA (Eclipselink) 作为 ORM,底层数据库是 PostgreSQL 9.5.1。
我的问题是,每当我发送一个需要保留 65,000 条记录的请求时,大约需要 27 秒才能完成操作。我决定实现序列池、序列预分配 = 1000 和批量写入,但这仅能从操作中删除 15 秒。
在调查数据库日志后,我注意到在应用优化之前和之后调用了相同的查询。
这是我优化的persistance.xml:
<persistence version="2.1" xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/persistence http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="com.kw.ktt.sms.server" transaction-type="JTA">
<jta-data-source>SMSDB</jta-data-source>
<non-jta-data-source>sequence</non-jta-data-source>
<class>com.kw.ktt.sms.server.core.TestClass</class>
<class>com.kw.ktt.sms.server.jpa.Customer</class>
<class>com.kw.ktt.sms.server.jpa.SMSAccount</class>
<class>com.kw.ktt.sms.server.jpa.SMSTransaction</class>
<class>com.kw.ktt.sms.server.jpa.ContactGroup</class>
<class>com.kw.ktt.sms.server.jpa.PhoneNumber</class>
<properties>
<property name="eclipselink.application-location" value="/Users/mousaalsulaimi/Desktop"/>
<property name="eclipselink.ddl-generation.output-mode" value="database"/>
<property name="eclipselink.logging.connection" value="true"/>
<property name="javax.persistence.schema-generation.database.action" value="drop-and-create"/>
<property name="eclipselink.ddl-generation" value="drop-and-create-tables"/>
<property name="eclipselink.jdbc.batch-writing" value="JDBC" />
<property name="eclipselink.jdbc.batch-writing.size" value="1000"/>
<property name="eclipselink.jdbc.sequence-connection-pool" value="true" />
<property name="eclipselink.connection-pool.sequence.nonJtaDataSource" value="sequence"/>
<property name="eclipselink.connection-pool.sequence.intial" value="1000" />
</properties>
</persistence-unit>
如上所述,我使用 JTA 连接池进行持久性(称为 SMSDB)和非 JTA 连接进行排序(称为序列),每个都有不同的数据库用户,以便轻松跟踪数据库日志的连接。
未优化连接 are here 的日志 - 这只是 10 条记录的示例。
优化连接are here 的日志 - 这只是 10 条记录的示例。
有人可以向我解释我做错了什么以及为什么两个持久性设置都会产生相同的查询,即使实际改进了 15 秒。
还有一件事,我在 Entitiy 的源代码中将序列预分配设置为 1000,从数据库日志中判断,序列按预期工作,并且正在获取正确的增量值。我关心的是批量写入,我担心它在persistence.xml中没有正确设置
更新
我已按照 Chris 的建议启用了登录 eclipse 链接, 这是使用优化的persistence.xml时产生的eclipseLink日志
2016-02-27T23:59:28.307+0300|Fine: SELECT CUSTOMERID, CIVILIDNUMBER, CREATEDATE, CREATEDBY, EMAIL, FULLNAME, ISACTIVE, ISADMIN, MALE, PASSWORD, PERSONAL, PHONENUMBER, STATUS, USERNAME, ACCOUNT_SMSACCOUNTID FROM CUSTOMER WHERE (CUSTOMERID = ?)
bind => [1 parameter bound]
2016-02-27T23:59:28.310+0300|Fine: SELECT SMSACCOUNTID, OOOREDOOO_BALANCE, VIVA_BALANCE, ZAIN_BALANCE FROM SMSACCOUNT WHERE (SMSACCOUNTID = ?)
bind => [1 parameter bound]
2016-02-27T23:59:28.312+0300|Fine: select nextval('SEQ_GEN_SEQUENCE')
2016-02-27T23:59:28.327+0300|Fine: select nextval('number_seq')
2016-02-27T23:59:28.331+0300|Info: this is the id 1
2016-02-27T23:59:28.332+0300|Fine: INSERT INTO SMSACCOUNT (SMSACCOUNTID, OOOREDOOO_BALANCE, VIVA_BALANCE, ZAIN_BALANCE) VALUES (?, ?, ?, ?)
bind => [4 parameters bound]
2016-02-27T23:59:28.335+0300|Fine: INSERT INTO CUSTOMER (CUSTOMERID, CIVILIDNUMBER, CREATEDATE, CREATEDBY, EMAIL, FULLNAME, ISACTIVE, ISADMIN, MALE, PASSWORD, PERSONAL, PHONENUMBER, STATUS, USERNAME, ACCOUNT_SMSACCOUNTID) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
bind => [15 parameters bound]
2016-02-27T23:59:28.337+0300|Fine: INSERT INTO CONTACTGROUP (GROUPID, CREATEBY, CREATEDATE, GROUPDESCRIPTION, GROUPNAME) VALUES (?, ?, ?, ?, ?)
bind => [5 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.340+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.340+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.340+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.342+0300|Fine: UPDATE CONTACTGROUP SET customerID = ? WHERE (GROUPID = ?)
bind => [2 parameters bound]
2016-02-27T23:59:28.343+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
及下面使用原始persistence.xml时产生的eclipse链接日志
2016-02-28T08:56:25.440+0300|Fine: SELECT CUSTOMERID, CIVILIDNUMBER, CREATEDATE, CREATEDBY, EMAIL, FULLNAME, ISACTIVE, ISADMIN, MALE, PASSWORD, PERSONAL, PHONENUMBER, STATUS, USERNAME, ACCOUNT_SMSACCOUNTID FROM CUSTOMER WHERE (CUSTOMERID = ?)
bind => [1 parameter bound]
2016-02-28T08:56:25.443+0300|Fine: SELECT SMSACCOUNTID, OOOREDOOO_BALANCE, VIVA_BALANCE, ZAIN_BALANCE FROM SMSACCOUNT WHERE (SMSACCOUNTID = ?)
bind => [1 parameter bound]
2016-02-28T08:56:25.445+0300|Fine: select nextval('SEQ_GEN_SEQUENCE')
2016-02-28T08:56:25.447+0300|Fine: select nextval('number_seq')
2016-02-28T08:56:25.449+0300|Info: this is the id 1
2016-02-28T08:56:25.450+0300|Fine: INSERT INTO SMSACCOUNT (SMSACCOUNTID, OOOREDOOO_BALANCE, VIVA_BALANCE, ZAIN_BALANCE) VALUES (?, ?, ?, ?)
bind => [4 parameters bound]
2016-02-28T08:56:25.451+0300|Fine: INSERT INTO CUSTOMER (CUSTOMERID, CIVILIDNUMBER, CREATEDATE, CREATEDBY, EMAIL, FULLNAME, ISACTIVE, ISADMIN, MALE, PASSWORD, PERSONAL, PHONENUMBER, STATUS, USERNAME, ACCOUNT_SMSACCOUNTID) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
bind => [15 parameters bound]
2016-02-28T08:56:25.452+0300|Fine: INSERT INTO CONTACTGROUP (GROUPID, CREATEBY, CREATEDATE, GROUPDESCRIPTION, GROUPNAME) VALUES (?, ?, ?, ?, ?)
bind => [5 parameters bound]
2016-02-28T08:56:25.452+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
bind => [3 parameters bound]
2016-02-28T08:56:25.453+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
bind => [3 parameters bound]
2016-02-28T08:56:25.453+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
bind => [3 parameters bound]
2016-02-28T08:56:25.454+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
bind => [3 parameters bound]
2016-02-28T08:56:25.454+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
bind => [3 parameters bound]
2016-02-28T08:56:25.454+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
bind => [3 parameters bound]
2016-02-28T08:56:25.455+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
bind => [3 parameters bound]
2016-02-28T08:56:25.455+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
bind => [3 parameters bound]
2016-02-28T08:56:25.455+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
bind => [3 parameters bound]
2016-02-28T08:56:25.456+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
bind => [3 parameters bound]
2016-02-28T08:56:25.456+0300|Fine: UPDATE CONTACTGROUP SET customerID = ? WHERE (GROUPID = ?)
bind => [2 parameters bound]
2016-02-28T08:56:25.457+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
bind => [2 parameters bound]
2016-02-28T08:56:25.457+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
bind => [2 parameters bound]
2016-02-28T08:56:25.458+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
bind => [2 parameters bound]
2016-02-28T08:56:25.458+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
bind => [2 parameters bound]
2016-02-28T08:56:25.459+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
bind => [2 parameters bound]
2016-02-28T08:56:25.459+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
bind => [2 parameters bound]
2016-02-28T08:56:25.460+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
bind => [2 parameters bound]
2016-02-28T08:56:25.460+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
bind => [2 parameters bound]
2016-02-28T08:56:25.460+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
bind => [2 parameters bound]
2016-02-28T08:56:25.461+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
bind => [2 parameters bound]
显然使用优化后的 persistence.xml 和原始的 persistence.xml 产生的查询有很大的不同。
【问题讨论】:
看起来 ContactGroup 更新迫使 Phonenumber 批处理语句比我预期的更早执行。您可能会查看此映射以确定为什么 EclipseLink 认为它必须先执行 ContactGroup 语句才能继续使用 Phonenumbers。 【参考方案1】:打开 EclipseLink 的 SQL 日志记录,您应该会看到在 JDBC 中准备和处理语句的方式有所不同,这应该说明为什么会有 15 秒的差异。
我不熟悉 eclipselink.connection-pool.sequence.intial 属性 - 您可能使用的是序列生成器本身中的 allocationSize 配置,以允许一次获取 1000 个序列。
如果不设置,批量写入会减少insert语句的数量,但是你仍然会看到大量的语句获取序列号,但是在不同的连接上--sequence是使用自己的连接池。
【讨论】:
我可能没有提到这一点,但我已经将分配大小设置为 1000 ,这是在实体的源代码中定义的。我不认为排序是问题,因为附加的日志显示 eclipse 链接正确请求序列值,例如对于 65000 条记录,eclipse 链接实际上只调用了 60 次 sequence.nextVal,这意味着预分配工作正常。至于 eclipselink.connection-pool.sequence .intial ,我认为这被 eclipslink 忽略了,因为我使用的是外部连接池(glassfish) 我担心批量写入无法正常工作或在持久性 .xml 中设置不正确,我将通过任何方式打开 jdbc 日志记录,正如您所指出的那样检查结果,我不知道此功能。非常感谢克里斯。 非常感谢克里斯,eclipselink 日志帮了很多忙,但我希望优化是在数据库级别而不是 ORM 级别上完成的,数据库日志看起来是一样的。以上是关于Eclipselink 和 Postgresql 批量编写的主要内容,如果未能解决你的问题,请参考以下文章
如何使用 EclipseLink 使 PostgreSQL 与 OSGi 一起工作
使用JPA + Eclipselink操作PostgreSQL数据库
使用JPA + Eclipselink操作PostgreSQL数据库
EclipseLink 拒绝将 PostgreSQL 上的本机查询映射到实体
使用 EclipseLink JPA 将 XML 类型存储到 PostgreSQL
没有运算符与给定名称和参数类型匹配。您可能需要添加显式类型转换。 -- Netbeans、Postgresql 8.4 和 Glassfish