Cassandra 无法为新主机创建连接池

Posted

技术标签:

【中文标题】Cassandra 无法为新主机创建连接池【英文标题】:Cassandra failed to create connection pool for new host 【发布时间】:2017-01-22 19:46:27 【问题描述】:

我正在使用以下组件运行 Web 应用程序:

Python 3.5.2 uWSGI 2.0.11.2 cassandra 驱动程序 3.6.0 卡桑德拉 3.7

使用 Cassandra 集群(3 个节点):

节点 1 - IP:172.17.0.4 节点 2 - IP:172.17.0.5 节点 3 - IP:172.17.0.6

使用配置 NetworkTopologyStrategyGossipingPropertyFileSnitch

我关注了uWSGI connection example from cqlengine:

from cqlengine import connection
from cassandra.io.libevreactor import LibevConnection
from cqlengine.connection import (
    cluster as cql_cluster, 
    session as cql_session
)


try:
    from uwsgidecorators import postfork
except ImportError:
    # We're not in a uWSGI context, no need to hook Cassandra session
    # initialization to the postfork event.
    pass
else:
    @postfork
    def cassandra_init():
        """ Initialize a new Cassandra session in the context.

        Ensures that a new session is returned for every new request.
        """
        if cql_cluster is not None:
            cql_cluster.shutdown()
        if cql_session is not None:
            cql_session.shutdown()

        connection.setup(
            ['172.0.4'],
            'my_keyspace',
            port=9042,
            connection_class=LibevConnection
        )

但我在所有 cassandra 节点(172.17.0.4172.17.0.5 和 172.17.0.6):

Respawned uWSGI worker 2 (new pid: 90)
mapping worker 2 to CPUs: 3 4 5
2016-09-14 21:00:47,434 WARNI [cassandra.cluster][Thread-2] Failed to create connection pool for new host 172.17.0.4:
Traceback (most recent call last):
    File "cassandra/cluster.py", line 2232, in cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool (cassandra/cluster.c:38826)
    File "cassandra/pool.py", line 328, in cassandra.pool.HostConnection.__init__ (cassandra/pool.c:6243)
    File "cassandra/cluster.py", line 1107, in cassandra.cluster.Cluster.connection_factory (cassandra/cluster.c:14943)
    File "cassandra/connection.py", line 330, in cassandra.connection.Connection.factory (cassandra/connection.c:5766)
cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
2016-09-14 21:00:47,437 WARNI [cassandra.cluster][Thread-1] Failed to create connection pool for new host 172.17.0.6:
Traceback (most recent call last):
    File "cassandra/cluster.py", line 2232, in cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool (cassandra/cluster.c:38826)
    File "cassandra/pool.py", line 328, in cassandra.pool.HostConnection.__init__ (cassandra/pool.c:6243)
    File "cassandra/cluster.py", line 1107, in cassandra.cluster.Cluster.connection_factory (cassandra/cluster.c:14943)
    File "cassandra/connection.py", line 330, in cassandra.connection.Connection.factory (cassandra/connection.c:5766)
cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
...The work of process 19 is done. Seeya!
worker 7 killed successfully (pid: 19)

根据日志,它能够连接到节点,但由于某种原因,它会断开连接并抛出以前的错误:

2016-09-15 23:23:03,786 DEBUG [cassandra.pool][Thread-2] Initializing connection for host 172.17.0.4
2016-09-15 23:23:03,787 DEBUG [cassandra.connection][Thread-2] Not sending options message for new connection(139905425534704) to 172.17.0.4 because compression is disabled and a cql version was not specified
2016-09-15 23:23:03,787 DEBUG [cassandra.connection][Thread-2] Sending StartupMessage on <LibevConnection(139905425534704) 172.17.0.4:9042>
2016-09-15 23:23:03,787 DEBUG [cassandra.connection][Thread-2] Sent StartupMessage on <LibevConnection(139905425534704) 172.17.0.4:9042>
2016-09-15 23:23:03,788 DEBUG [cassandra.connection][event_loop] Got ReadyMessage on new connection (139905425534704) from 172.17.0.4
2016-09-15 23:23:03,788 DEBUG [cassandra.pool][Thread-2] Finished initializing connection for host 172.17.0.4
2016-09-15 23:23:03,788 DEBUG [cassandra.cluster][Thread-2] Added pool for host 172.17.0.4 to session
2016-09-15 22:24:29,239 DEBUG [cassandra.io.libevreactor][Thread-2] Closing connection (139945376028152) to 172.17.0.4
2016-09-15 22:24:29,240 DEBUG [cassandra.io.libevreactor][Thread-2] Closed socket to 172.17.0.4
2016-09-15 22:24:29,240 DEBUG [cassandra.connection][Thread-2] Connection to 172.17.0.4 was closed during the startup handshake
2016-09-15 22:24:29,242 WARNI [cassandra.cluster][Thread-2] Failed to create connection pool for new host 172.17.0.4:

已编辑(添加了有关该问题的更多调试信息)

该应用可以 ping 端口 9042 上的任何节点,因此这不是连接问题。如果我运行nodetool status,集群中的三个节点似乎没问题:

--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.17.0.5  111.72 KiB  256          67.3%             fdd4740d-1ce5-4deb-9a3e-5c18c80ee63e  rack1
UN  172.17.0.4  98.96 KiB  256          66.8%             4fe5a60c-2b6a-4d57-ab6a-e4176ce69b68  rack1
UN  172.17.0.6  94.67 KiB  256          66.0%             5e2675e3-c2a7-4af1-80f0-4cb9573ecf2b  rack1

我曾尝试使用 Cassandra 3.72.2.7,但得到了相同的结果。但是,如果我尝试仅使用 Node1 运行该应用程序,它就可以工作!

Cassandra 节点中的日志显示如下:

INFO  22:14:33 Unexpected exception during request; channel = [id: 0xd6d3c9ae, L:/172.17.0.6:9042 ! R:/172.17.0.5:42944]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer
INFO  22:16:26 Unexpected exception during request; channel = [id: 0x2cfa996f, L:/172.17.0.6:9042 - R:/172.17.0.5:42954]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer
INFO  22:54:13 Unexpected exception during request; channel = [id: 0x4134ec0f, L:/172.17.0.6:9042 ! R:/172.17.0.5:42992]

有人知道这里发生了什么吗? 任何帮助将不胜感激。

【问题讨论】:

【参考方案1】:

请注意,ping 并不是真正的 TCP,因此您可能还想使用nc -nz &lt;ip&gt; 9042 来验证 TCP。但是,由于您遇到超时而不是“连接被拒绝”,我会假设它不是连接。

您应该检查的主要内容是您的 uWSGI 配置是否启用了任何类型的猴子补丁(例如 gevent)。示例中使用的 libevreactor 使用标准库,并且假定没有修补。

我认为您可以通过禁用补丁或删除显式 connection_class 参数来解决此问题,在这种情况下,驱动程序将检测补丁并相应地默认反应器实现。

【讨论】:

感谢@Adam Holmberg 的帮助,非常感谢。我尝试使用AsyncoreConnection 代替LibevConnection,但仍然遇到同样的问题。我没有在我的 uWSGI 上使用任何猴子补丁(除非它在 ​​uWSGI 版本 2.0.11.2 上默认激活)。当我运行nodetool status 时,我可以看到集群中正确显示了 3 个节点。 奇怪。这几乎总是补丁/反应器的某种组合。设置后尝试打印connection.cluster.connection_class。这是正确的吗?你能确定 socket.socket 是 socket._socketobject 而不是一些修补过的 ref 吗?打开驱动程序的调试日志记录?您是否尝试过编写脚本来连接外部 uWSGI 上下文? 我已经尝试过 connection.Cluster.connection_class 并且看起来它正在使用 cassandra.io.libevreactor,即使我正在传递参数 connection_class=AsyncoreConnection!所以我猜这个参数在一些最新版本中已经从驱动程序中删除了。从 socket.socket 我得到 &lt;socket.socket fd=9, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 0)&gt; 。我尝试从我的服务器中删除 libev 以查看它是否回退到 Asyncore,但运气不好,它崩溃了:ImportError: The C extension needed to use libev was not found. 我已经发布了我的 uWSGI 配置。所以我想现在的主要问题是如何让它在 Asyncore 模式下运行。或者更好的是,鉴于 uWSGI 不是猴子补丁,我怎样才能使它与 libev 一起工作。也许 uWSGI 需要某种插件,还是应该立即工作?为什么只有在超过 1 个节点时才会发生这种情况?感谢您抽出时间@Adam,这让我发疯了。

以上是关于Cassandra 无法为新主机创建连接池的主要内容,如果未能解决你的问题,请参考以下文章

连接池通俗易懂的工作原理

ORA-00604。我创建连接池,我试图在glassfish中ping

SpringBoot MySQL JDBC无法创建池的初始连接

JDBC连接池设置无法在Glassfish5上运行

无法在 Glassfish 中为 h2 数据库创建连接池和数据源

OBIEE 创建 JSON 列表连接池文件