ICacheLock 上的 Apache Ignite.NET TryEnter 在网络通信错误时返回 false 而不是抛出异常
Posted
技术标签:
【中文标题】ICacheLock 上的 Apache Ignite.NET TryEnter 在网络通信错误时返回 false 而不是抛出异常【英文标题】:Apache Ignite.NET TryEnter on ICacheLock returns false on network communication error instead of throwing exception 【发布时间】:2018-04-12 00:54:43 【问题描述】:这是场景。
-
出现网络问题
Apache Ignite.NET 集群有 1 个被分段的节点。我可以在日志中看到这一点,有问题的节点记录了 NodeSegmented 事件
在分段节点上,如果您从 ICache 对象中获取 ICacheLock 对象,然后尝试使用 TryEnter() 输入锁,则会得到返回值 false。不是因为缓存键已经被锁定,而是因为这种网络分段,奇怪的是似乎是什么。
重新启动分段节点,它会重新加入集群并按预期工作。
这是发生此事件时我在日志中看到的堆栈跟踪:
Failed to send unlock request to node (will make best effort to complete): TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false]] Native:[class org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[/10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false], topic=TOPIC_CACHE, msg=GridDhtUnlockRequest [], policy=2]
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1651)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1715)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1141)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.removeLocks(GridDhtTransactionalCacheAdapter.java:1652)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.undoLocks(GridDhtLockFuture.java:425)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:719)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onDone(GridDhtLockFuture.java:703)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onDone(GridDhtLockFuture.java:82)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:461)
at org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:129)
at org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:382)
at org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:346)
at org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:334)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:494)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:473)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:461)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$MiniFuture.onResult(GridDhtLockFuture.java:1191)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.map(GridDhtLockFuture.java:959)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onOwnerChanged(GridDhtLockFuture.java:655)
at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.notifyOwnerChanged(GridCacheMvccManager.java:226)
at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.access$200(GridCacheMvccManager.java:80)
at org.apache.ignite.internal.processors.cache.GridCacheMvccManager$3.onOwnerChanged(GridCacheMvccManager.java:163)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.checkOwnerChanged(GridCacheMapEntry.java:3669)
at org.apache.ignite.internal.processors.cache.distributed.GridDistributedCacheEntry.readyLock(GridDistributedCacheEntry.java:469)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.readyLocks(GridDhtLockFuture.java:567)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.map(GridDhtLockFuture.java:764)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.lockAllAsync0(GridDhtColocatedCache.java:1066)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.lockAllAsync(GridDhtColocatedCache.java:937)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.lockLocally(GridDhtColocatedLockFuture.java:1171)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.mapAsPrimary(GridDhtColocatedLockFuture.java:1282)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map0(GridDhtColocatedLockFuture.java:852)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:813)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.mapOnTopology(GridDhtColocatedLockFuture.java:772)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:720)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.lockAllAsync(GridDhtColocatedCache.java:664)
at org.apache.ignite.internal.processors.cache.distributed.GridDistributedCacheAdapter.lockAllAsync(GridDistributedCacheAdapter.java:117)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.lockAll(GridCacheAdapter.java:3258)
at org.apache.ignite.internal.processors.cache.CacheLockImpl.tryLock(CacheLockImpl.java:109)
at org.apache.ignite.internal.processors.cache.CacheLockImpl.tryLock(CacheLockImpl.java:130)
at org.apache.ignite.internal.processors.platform.cache.PlatformCache.processInStreamOutLong(PlatformCache.java:524)
at org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutLong(PlatformTargetProxyImpl.java:65)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2544)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2480)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1643)
... 41 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104:47100]]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3179)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2763)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2655)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2516)
... 43 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address [addr=10.20.18.104:47100, err=Failed to read remote node recovery handshake (connection closed).]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3184)
... 46 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read remote node recovery handshake (connection closed).
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:3438)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3044)
... 46 more
]
还有一点不同:
Level: [Error], Message:[<ResoDupCheck> Failed to send unlock request [keys=[UserKeyCacheObjectImpl [part=482, val=201804141800-2-190327-110016411351-pat-clarkson-greene, hasValBytes=true]], n=TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false]]] Native:[class org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false], topic=TOPIC_CACHE, msg=GridNearUnlockRequest [super=GridDistributedUnlockRequest [keys=[UserKeyCacheObjectImpl [part=482, val=201804141800-2-190327-110016411351-pat-clarkson-greene, hasValBytes=true]], super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=121528584, order=1523348164577, nodeOrder=3178], committedVers=[], rolledbackVers=[], cnt=1, super=GridCacheIdMessage [cacheId=-1009505448]]]], policy=2]
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1651)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1715)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1141)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.removeLocks(GridDhtColocatedCache.java:877)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.undoLocks(GridDhtColocatedLockFuture.java:383)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.onComplete(GridDhtColocatedLockFuture.java:575)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.onDone(GridDhtColocatedLockFuture.java:559)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:819)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.mapOnTopology(GridDhtColocatedLockFuture.java:772)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:720)
at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.lockAllAsync(GridDhtColocatedCache.java:664)
at org.apache.ignite.internal.processors.cache.distributed.GridDistributedCacheAdapter.lockAllAsync(GridDistributedCacheAdapter.java:117)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.lockAll(GridCacheAdapter.java:3258)
at org.apache.ignite.internal.processors.cache.CacheLockImpl.tryLock(CacheLockImpl.java:109)
at org.apache.ignite.internal.processors.cache.CacheLockImpl.tryLock(CacheLockImpl.java:130)
at org.apache.ignite.internal.processors.platform.cache.PlatformCache.processInStreamOutLong(PlatformCache.java:524)
at org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutLong(PlatformTargetProxyImpl.java:65)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: TcpDiscoveryNode [id=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104], sockAddrs=[10.20.18.104:49100], discPort=49100, order=3174, intOrder=1590, lastExchangeTime=1523347291158, loc=false, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2544)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2480)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1643)
... 16 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=d8b54715-4597-410c-a027-3c76d28ec7f1, addrs=[10.20.18.104:47100]]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3179)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2763)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2655)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2516)
... 18 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address [addr=10.20.18.104:47100, err=Failed to read remote node recovery handshake (connection closed).]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3184)
... 21 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read remote node recovery handshake (connection closed).
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:3438)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3044)
... 21 more
]
我的主要问题是,为什么 ICacheLock 不抛出异常?通过返回 false,它错误地告诉我缓存键已经被锁定。因为我无法知道 false 是由于某些网络问题还是由于缓存键已被锁定。
我当前的解决方案是:将侦听器添加到 NodeSegment 本地事件并关闭/重新启动 Ignite 节点。使用 Polly 的断路器来检查是否有超过 50% 的请求未能在 30 秒内获取锁的防御性备份计划。这应该是不太可能的情况,并且会导致跳过锁定调用并继续进行(处于降级状态)。
我的 Ignite.NET 配置中是否缺少某些内容?
我是否缺少对 Ignite 工作原理的一些了解?
是否有某种编程方式可以了解 TryEnter 调用为何返回 false 并决定如何继续?
【问题讨论】:
【参考方案1】:看起来 Ignite 没有将异常从 Java 部分传播到 .NET 部分。如果我们尝试在 Java API 中做同样的事情,tryEnter() 会抛出 javax.cache.CacheException。
我已创建 Jira 票证来解决此问题:https://issues.apache.org/jira/browse/IGNITE-8247
另外,请确保缓存中存在密钥(您尝试锁定的)。
作为一种解决方法,您可以为 ClientDisconnected、ClientReconnected 事件添加自己的侦听器。这是一个例子:
class TryEnterIssue
public static bool ClientDisconnected = false;
static void Main(string[] args)
var cfg = new IgniteConfiguration() ... ;
using (var ignite = Ignition.Start(cfg))
...
ICache<int, string> cache = ignite.GetOrCreateCache<int, string>(cacheConfiguration);
ignite.ClientDisconnected += (sender, eventArgs) =>
ClientDisconnected = true;
Console.WriteLine("Client disconnected.");
;
ignite.ClientReconnected += (sender, eventArgs) =>
ClientDisconnected = false;
Console.WriteLine("Client reconnected.");
;
...
ICacheLock lock1 = cache.Lock(1);
try
if (!lock1.TryEnter())
if (ClientDisconnected)
// Client is disconnected.
else
// Unable to acquire a lock.
else
lock1.Exit();
catch (Exception e)
...
...
【讨论】:
有趣。感谢您为此创建票证。尽管您的解决方法建议在我的情况下不起作用,因为 ClientDisconnected,ClientReconnected 事件被记录为仅在节点在 ClientMode 下运行时才有效。我的节点在服务器模式下运行。以上是关于ICacheLock 上的 Apache Ignite.NET TryEnter 在网络通信错误时返回 false 而不是抛出异常的主要内容,如果未能解决你的问题,请参考以下文章