AlwaysOn - 集群租用超时和 PREEMPTIVE_HADR_LEASE_MECHANISM

Posted

技术标签:

【中文标题】AlwaysOn - 集群租用超时和 PREEMPTIVE_HADR_LEASE_MECHANISM【英文标题】:AlwaysOn - cluster lease timeouts and PREEMPTIVE_HADR_LEASE_MECHANISM 【发布时间】:2016-06-13 10:21:02 【问题描述】:

我们最近安装了一些 WSUS 更新 + SQL 2012 SP3(是的,所有测试都在 UAT 中没有问题 :) 并且因为看起来 AO 和集群几乎没有问题 - 集群的租约似乎正在超时,我我无法弄清楚为什么.. ;/这会导致短暂的信号中断和失去连接。

任何帮助将不胜感激!

AlwaysOn 扩展事件:

availability_group_lease_expired; state: LeaseEpxired; Timestamp: 2016-06-12 04:58:40.34
availability_replica_state_change: current state: Resolving_Normal; previous_sate: Primary_Normal;Timestamp: 2016-06-12 04:58:40.34
..
availability_replica_state_change: current state: Primary_Normal; previous_sate: Primary_Pending;Timestamp: 2016-06-12 04:58:52.96

SQL 日志:

Date: 12/06/2016 04:58:40; Error: 19421, Severity: 16, State: 1.
SQL Server hosting availability group did not receive a process event signal from the Windows Server Failover Cluster within the lease timeout period.

Date: 12/06/2016 04:58:40; Error: 19407, Severity: 16, State: 1.
The lease between availability group and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.

Date: 12/06/2016 04:58:40
AlwaysOn: The local replica of availability group is going offline because either the lease expired or lease renewal failed. This is an informational message only. No user action is required.

集群日志(不要问我为什么是-1h,所有节点上的日期都ok):

2016/06/12-03:58:40.587 INFO  [RCM] rcm::RcmApi::FailResource: (AlwaysOn)
2016/06/12-03:58:40.588 INFO  [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'AlwaysOn', gen(3) result 0/0.
2016/06/12-03:58:40.588 INFO  [RCM] Res AlwaysOn: Online -> ProcessingFailure( StateUnknown )
2016/06/12-03:58:40.588 INFO  [RCM] TransitionToState(AlwaysOn) Online-->ProcessingFailure.
2016/06/12-03:58:40.588 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (AlwaysOn, Online --> Pending)
2016/06/12-03:58:40.588 ERR   [RCM] rcm::RcmResource::HandleFailure: (AlwaysOn)
2016/06/12-03:58:40.588 INFO  [RCM] resource AlwaysOn: failure count: 1, restartAction: 2 persistentState: 1.
2016/06/12-03:58:40.588 INFO  [RCM] numDependents is zero, auto-returning true
2016/06/12-03:58:40.588 INFO  [RCM] Greater than restartPeriod time has elapsed since first failure of AlwaysOn, resetting failureTime and failureCount.
2016/06/12-03:58:40.588 INFO  [RCM] Will queue immediate restart (500 milliseconds) of AlwaysOn after terminate is complete.
2016/06/12-03:58:40.588 INFO  [RCM] Res AlwaysOn: ProcessingFailure -> WaitingToTerminate( DelayRestartingResource )
2016/06/12-03:58:40.588 INFO  [RCM] TransitionToState(AlwaysOn) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].
2016/06/12-03:58:40.588 INFO  [RCM] Res AlwaysOn: [WaitingToTerminate to DelayRestartingResource] -> Terminating( DelayRestartingResource )
2016/06/12-03:58:40.588 INFO  [RCM] TransitionToState(AlwaysOn) [WaitingToTerminate to DelayRestartingResource]-->[Terminating to DelayRestartingResource].
2016/06/12-03:58:40.588 ERR   [RES] SQL Server Availability Group <AlwaysOn>: [hadrag] Lease Thread terminated
2016/06/12-03:58:40.588 ERR   [RES] SQL Server Availability Group <AlwaysOn>: [hadrag] The lease is expired. The lease should have been renewed by 2016/06/12-03:58:30.348
2016/06/12-03:58:40.588 INFO  [RES] SQL Server Availability Group: [hadrag] Stopping Health Worker Thread
2016/06/12-03:58:40.588 INFO  [RES] SQL Server Availability Group: [hadrag] Health worker was asked to terminate

有些奇怪 - 过去 12 小时的 SQL 等待时间:

wait type                        Wait Time      % of Total Wait
PREEMPTIVE_HADR_LEASE_MECHANISM  80,183,360 ms  39.09%
PREEMPTIVE_SP_SERVER_DIAGNOSTICS 80,183,265 ms  39.09%
HADR_CLUSAPI_CALL                40,534,655 ms  19.76%

狡猾的更新某处?如果您有任何提示,请告诉我。

提前致谢, 托马斯

【问题讨论】:

【参考方案1】:

1) 尝试重新启动您的服务器。

2) 如果服务器无响应或 CPU 利用率达到 100 %,您会看到这些奇怪的错误。

【讨论】:

以上是关于AlwaysOn - 集群租用超时和 PREEMPTIVE_HADR_LEASE_MECHANISM的主要内容,如果未能解决你的问题,请参考以下文章

部署AlwaysOn第三步:集群的资源组

SQL Server Alwayson搭建四:故障转移集群配置

部署AlwaysOn第三步:集群资源组的健康检测和故障转移

Windows Server2016+SqlServer2016搭建AlwaysOn集群

Windows Server2016+SqlServer2016搭建AlwaysOn集群

Windows Server2016+SqlServer2016搭建AlwaysOn集群