rabbitMQ队列处理导致连接池耗尽Tomcat假死问题排查处理

Posted chonghaojie

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了rabbitMQ队列处理导致连接池耗尽Tomcat假死问题排查处理相关的知识,希望对你有一定的参考价值。

背景:
监听器针对RabbitMQ队列做业务数据处理
系统问题表现:
业务系统无法正常使用,所有请求均不予相应,报404异常
控制台问题表现:
接收队列数据的logger日志打印,但是相关sql不打印(之前sql打印)
报错异常:
dbcp连接池(开始使用)

[WARN ] 19:01:05.762 [SimpleAsyncTaskExecutor-1] o.h.util.JDBCExceptionReporter - SQL Error: 1040, SQLState: 08004
[ERROR] 19:01:05.762 [SimpleAsyncTaskExecutor-1] o.h.util.JDBCExceptionReporter - Data source rejected establishment of connection,  message from server: "Too many connections"
Caused by: org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Cannot open connection
Caused by: org.hibernate.exception.JDBCConnectionException: Cannot open connection
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Data source rejected establishment of connection,  message from server: "Too many connections"

druid连接池(为方便监控,后来修改采用,后续的测试分析均采用druid连接池后所得)

[WARN ] 17:35:23.659 [SimpleAsyncTaskExecutor-1] o.h.util.JDBCExceptionReporter - SQL Error: 0, SQLState: null
[ERROR] 17:35:23.659 [SimpleAsyncTaskExecutor-1] o.h.util.JDBCExceptionReporter - wait millis 10000, active 152, maxActive 500, creating 0
[ERROR] 17:35:31.106 [QuartzScheduler_scheduler-LAPTOP-B40EFQQR1565686921066_ClusterManager] o.s.s.quartz.LocalDataSourceJobStore - ClusterManager: Error managing cluster: Failed to obtain DB connection from data source 'springNonTxDataSource.scheduler': com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 10000, active 152, maxActive 500, creating 0
org.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'springNonTxDataSource.scheduler': com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 10000, active 152, maxActive 500, creating 0
Caused by: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 10000, active 152, maxActive 500, creating 0
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Data source rejected establishment of connection,  message from server: "Too many connections"
[WARN ] 17:35:43.742 [SimpleAsyncTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer - Execution of Rabbit message listener failed, and no ErrorHandler has been set.
org.springframework.amqp.rabbit.listener.ListenerExecutionFailedException: Listener threw exception
Caused by: org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is org.hibernate.exception.GenericJDBCException: Cannot open connection
Caused by: org.hibernate.exception.GenericJDBCException: Cannot open connection
Caused by: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 10000, active 152, maxActive 500, creating 0

监听器核心代码:

protected void doBusiness(String msg, String queueName) 
    List<WeldingDto> list;
    list = new LinkedList<>();
    //校验数据合法性
    super.validatePattern(msg, list, WeldingDto.class);
    if (!list.isEmpty()) 
        //保存wip表相关信息
        WeldingDto weldingDto = list.get(0);
        EquipDto equipDto = this.wipTempService.getEquipDto(weldingDto.getEquipCode(), null);
        List<WorkOrder> workingOrderList = this.workOrderDao.getWorkingOrderListByEquip(equipDto.getId());
        //注释部分业务代码,不影响问题分析
    

问题原因分析:
逻辑连接打开次数 > 逻辑连接关闭次数 说明:连接泄露
最终导致池中连接数被耗尽,池中连接数变为0;
佐证问题测试发现:
将队列清空,然后每丢一条数据,连接泄露数+N
排查方案:
采用druid连接池的超时回收机制,配置如下:

<!-- 超过时间限制是否回收 -->  
<property name="removeAbandoned" value="true" />  
<!-- 超时时间,单位为秒 -->  
<property name="removeAbandonedTimeout" value="20" />  
<!-- 关闭abanded连接时输出错误日志 -->  
<property name="logAbandoned" value="true" />

添加以上配置后,可以在druid监视器的"数据源datasource"的"活跃连接堆栈查看ActiveConnection StackTrace"
看到异常日志,异常日志可以看出连接占用未关闭的代码位置。
注:该配置影响系统性能,建议排查问题时打开,系统运行时关闭。

异常日志如下:

java.lang.Thread.getStackTrace(Thread.java:1559)
com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1209)
com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1135)
com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1125)
com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:104)
org.springframework.orm.hibernate3.LocalDataSourceConnectionProvider.getConnection(LocalDataSourceConnectionProvider.java:83)
org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:446)
org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:167)
org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:161)
org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1616)
org.hibernate.loader.Loader.doQuery(Loader.java:717)
org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:270)
org.hibernate.loader.Loader.doList(Loader.java:2449)
org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2192)
org.hibernate.loader.Loader.list(Loader.java:2187)
org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:452)
org.hibernate.hql.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:363)
org.hibernate.engine.query.HQLQueryPlan.performList(HQLQueryPlan.java:196)
org.hibernate.impl.SessionImpl.list(SessionImpl.java:1258)
org.hibernate.impl.QueryImpl.list(QueryImpl.java:102)
com.wisdytech.linkmes.plan.dao.impl.WorkOrderDaoImpl.getWorkingOrderListByEquip(WorkOrderDaoImpl.java:331)
com.wisdytech.linkmes.msghandler.listeners.business.WeldingQueueListener.doBusiness(WeldingQueueListener.java:109)
com.wisdytech.linkmes.msghandler.listeners.CommonQueueListener.onMessage(CommonQueueListener.java:70)
org.springframework.amqp.rabbit.listener.adapter.MessageListenerAdapter.onMessage(MessageListenerAdapter.java:334)
org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:546)
org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:472)
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$001(SimpleMessageListenerContainer.java:61)
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$1.invokeListener(SimpleMessageListenerContainer.java:111)
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.invokeListener(SimpleMessageListenerContainer.java:627)
org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.executeListener(AbstractMessageListenerContainer.java:454)
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.doReceiveAndExecute(SimpleMessageListenerContainer.java:480)
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.receiveAndExecute(SimpleMessageListenerContainer.java:464)
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$300(SimpleMessageListenerContainer.java:61)
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:558)
java.lang.Thread.run(Thread.java:748)

找到产生问题的指定代码位置后,分析代码发现getSession()创建后没有对该创建session关闭。
解决方案1:手动关闭

Session session = this.getSession();
org.hibernate.Query query = session.createQuery(hql.toString());
List<XXX> list = query.list();
session.close();

解决方案2:使用回调机制,通过spring关闭

this.getHibernateTemplate().executeFind(new HibernateCallback<Object>() 
    @Override
    public Object doInHibernate(Session session) throws HibernateException 
        org.hibernate.Query query = session.createQuery(hql.toString());//输入参数类型final
        List<XXX> list1 = query.list();
        return list1;
    
);

//下面两个方法自带回调

this.getHibernateTemplate().find(hql.toString());
this.getHibernateTemplate().get(XXX.class,"id");
getSession() 需手动关闭
getHibernateTemplate() 经过spring封装添加相应的声明式事务管理 spring管理

注:建议使用getHibernateTemplate(),针对getHibernateTemplate()未提供的方法,可以使用HibernateCallback回调方法管理数据库,代码如上。
解决方案3:加入事务
本次问题是在监听器中调用dao层方法,dao层没有加入事务;可以在监听器最外层套一层有事务的service方法或者给dao层添加事务。

以上是关于rabbitMQ队列处理导致连接池耗尽Tomcat假死问题排查处理的主要内容,如果未能解决你的问题,请参考以下文章

tomcat JDBC连接池c3p0连接资源耗尽导致tomcat实例对应的app移动端无法访问。

线程池工作队列饱和策略

HttpClient连接池耗尽引发雪崩问题

连接池耗尽了!!!

在 Hibernate/C3P0 中处理连接池耗尽并避免死锁

02-伪异步I/O通讯模型