ICacheLock 上的 Apache Ignite.NET TryEnter 在网络通信错误时返回 false 而不是抛出异常

Posted

技术标签:

【中文标题】ICacheLock 上的 Apache Ignite.NET TryEnter 在网络通信错误时返回 false 而不是抛出异常【英文标题】:Apache Ignite.NET TryEnter on ICacheLock returns false on network communication error instead of throwing exception 【发布时间】:2018-04-12 00:54:43 【问题描述】:

这是场景。

    出现网络问题 Apache Ignite.NET 集群有 1 个被分段的节点。我可以在日志中看到这一点,有问题的节点记录了 NodeSegmented 事件 在分段节点上,如果您从 ICache 对象中获取 ICacheLock 对象,然后尝试使用 TryEnter() 输入锁,则会得到返回值 false。不是因为缓存键已经被锁定,而是因为这种网络分段,奇怪的是似乎是什么。 重新启动分段节点,它会重新加入集群并按预期工作。

这是发生此事件时我在日志中看到的堆栈跟踪:

Failed to send unlock request to node (will make best effort to complete): TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false]] Native:[class org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[/10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false], topic=TOPIC_CACHE, msg=GridDhtUnlockRequest [], policy=2]
    at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1651)
    at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1715)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1141)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.removeLocks(GridDhtTransactionalCacheAdapter.java:1652)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.undoLocks(GridDhtLockFuture.java:425)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:719)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onDone(GridDhtLockFuture.java:703)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onDone(GridDhtLockFuture.java:82)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:461)
    at org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:129)
    at org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:382)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:346)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:334)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:494)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:473)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:461)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$MiniFuture.onResult(GridDhtLockFuture.java:1191)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.map(GridDhtLockFuture.java:959)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onOwnerChanged(GridDhtLockFuture.java:655)
    at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.notifyOwnerChanged(GridCacheMvccManager.java:226)
    at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.access$200(GridCacheMvccManager.java:80)
    at org.apache.ignite.internal.processors.cache.GridCacheMvccManager$3.onOwnerChanged(GridCacheMvccManager.java:163)
    at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.checkOwnerChanged(GridCacheMapEntry.java:3669)
    at org.apache.ignite.internal.processors.cache.distributed.GridDistributedCacheEntry.readyLock(GridDistributedCacheEntry.java:469)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.readyLocks(GridDhtLockFuture.java:567)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.map(GridDhtLockFuture.java:764)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.lockAllAsync0(GridDhtColocatedCache.java:1066)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.lockAllAsync(GridDhtColocatedCache.java:937)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.lockLocally(GridDhtColocatedLockFuture.java:1171)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.mapAsPrimary(GridDhtColocatedLockFuture.java:1282)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map0(GridDhtColocatedLockFuture.java:852)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:813)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.mapOnTopology(GridDhtColocatedLockFuture.java:772)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:720)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.lockAllAsync(GridDhtColocatedCache.java:664)
    at org.apache.ignite.internal.processors.cache.distributed.GridDistributedCacheAdapter.lockAllAsync(GridDistributedCacheAdapter.java:117)
    at org.apache.ignite.internal.processors.cache.GridCacheAdapter.lockAll(GridCacheAdapter.java:3258)
    at org.apache.ignite.internal.processors.cache.CacheLockImpl.tryLock(CacheLockImpl.java:109)
    at org.apache.ignite.internal.processors.cache.CacheLockImpl.tryLock(CacheLockImpl.java:130)
    at org.apache.ignite.internal.processors.platform.cache.PlatformCache.processInStreamOutLong(PlatformCache.java:524)
    at org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutLong(PlatformTargetProxyImpl.java:65)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2544)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2480)
    at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1643)
    ... 41 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104:47100]]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3179)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2763)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2655)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2516)
    ... 43 more
    Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address [addr=10.20.18.104:47100, err=Failed to read remote node recovery handshake (connection closed).]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3184)
        ... 46 more
    Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read remote node recovery handshake (connection closed).
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:3438)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3044)
        ... 46 more
]

还有一点不同:

Level: [Error], Message:[<ResoDupCheck> Failed to send unlock request [keys=[UserKeyCacheObjectImpl [part=482, val=201804141800-2-190327-110016411351-pat-clarkson-greene, hasValBytes=true]], n=TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false]]] Native:[class org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false], topic=TOPIC_CACHE, msg=GridNearUnlockRequest [super=GridDistributedUnlockRequest [keys=[UserKeyCacheObjectImpl [part=482, val=201804141800-2-190327-110016411351-pat-clarkson-greene, hasValBytes=true]], super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=121528584, order=1523348164577, nodeOrder=3178], committedVers=[], rolledbackVers=[], cnt=1, super=GridCacheIdMessage [cacheId=-1009505448]]]], policy=2]
    at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1651)
    at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1715)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1141)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.removeLocks(GridDhtColocatedCache.java:877)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.undoLocks(GridDhtColocatedLockFuture.java:383)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.onComplete(GridDhtColocatedLockFuture.java:575)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.onDone(GridDhtColocatedLockFuture.java:559)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:819)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.mapOnTopology(GridDhtColocatedLockFuture.java:772)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:720)
    at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.lockAllAsync(GridDhtColocatedCache.java:664)
    at org.apache.ignite.internal.processors.cache.distributed.GridDistributedCacheAdapter.lockAllAsync(GridDistributedCacheAdapter.java:117)
    at org.apache.ignite.internal.processors.cache.GridCacheAdapter.lockAll(GridCacheAdapter.java:3258)
    at org.apache.ignite.internal.processors.cache.CacheLockImpl.tryLock(CacheLockImpl.java:109)
    at org.apache.ignite.internal.processors.cache.CacheLockImpl.tryLock(CacheLockImpl.java:130)
    at org.apache.ignite.internal.processors.platform.cache.PlatformCache.processInStreamOutLong(PlatformCache.java:524)
    at org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutLong(PlatformTargetProxyImpl.java:65)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2544)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2480)
    at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1643)
    ... 16 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104:47100]]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3179)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2763)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2655)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2516)
    ... 18 more
    Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address [addr=10.20.18.104:47100, err=Failed to read remote node recovery handshake (connection closed).]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3184)
        ... 21 more
    Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read remote node recovery handshake (connection closed).
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:3438)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3044)
        ... 21 more
]

我的主要问题是,为什么 ICacheLock 不抛出异常?通过返回 false,它错误地告诉我缓存键已经被锁定。因为我无法知道 false 是由于某些网络问题还是由于缓存键已被锁定。

我当前的解决方案是:将侦听器添加到 NodeSegment 本地事件并关闭/重新启动 Ignite 节点。使用 Polly 的断路器来检查是否有超过 50% 的请求未能在 30 秒内获取锁的防御性备份计划。这应该是不太可能的情况,并且会导致跳过锁定调用并继续进行(处于降级状态)。

我的 Ignite.NET 配置中是否缺少某些内容?

我是否缺少对 Ignite 工作原理的一些了解?

是否有某种编程方式可以了解 TryEnter 调用为何返回 false 并决定如何继续?

【问题讨论】:

【参考方案1】:

看起来 Ignite 没有将异常从 Java 部分传播到 .NET 部分。如果我们尝试在 Java API 中做同样的事情,tryEnter() 会抛出 javax.cache.CacheException。

我已创建 Jira 票证来解决此问题:https://issues.apache.org/jira/browse/IGNITE-8247

另外,请确保缓存中存在密钥(您尝试锁定的)。

作为一种解决方法,您可以为 ClientDisconnected、ClientReconnected 事件添加自己的侦听器。这是一个例子:

class TryEnterIssue

    public static bool ClientDisconnected = false;

    static void Main(string[] args)
    
        var cfg = new IgniteConfiguration()  ... ;

        using (var ignite = Ignition.Start(cfg))
        
            ...

            ICache<int, string> cache = ignite.GetOrCreateCache<int, string>(cacheConfiguration);

            ignite.ClientDisconnected += (sender, eventArgs) =>
            
                ClientDisconnected = true;

                Console.WriteLine("Client disconnected.");
            ;

            ignite.ClientReconnected += (sender, eventArgs) =>
            
                ClientDisconnected = false;

                Console.WriteLine("Client reconnected.");
            ;

            ...

            ICacheLock lock1 = cache.Lock(1);
            try
            
                if (!lock1.TryEnter())
                
                    if (ClientDisconnected)
                    
                        // Client is disconnected.
                    
                    else
                    
                        // Unable to acquire a lock.
                    
                
                else
                
                    lock1.Exit();
                
            
            catch (Exception e)
            
                ...
            

            ...
        
    

【讨论】:

有趣。感谢您为此创建票证。尽管您的解决方法建议在我的情况下不起作用,因为 ClientDisconnected,ClientReconnected 事件被记录为仅在节点在 ClientMode 下运行时才有效。我的节点在服务器模式下运行。

以上是关于ICacheLock 上的 Apache Ignite.NET TryEnter 在网络通信错误时返回 false 而不是抛出异常的主要内容,如果未能解决你的问题,请参考以下文章

apache_conf Apache上的SPA

Apache Spark 上的 Apache Hive

apache_conf Apache上的缓存控制标头

Ubuntu 上的 Apache2 配置文件路径。 [关闭]

Windows 上的 XAMPP - Apache 未启动

Apache:本地网络上的 Wordpress