长时间运行的 Azure WebJob 失败 - 客户端无法在指定的超时时间内完成操作
Posted
技术标签:
【中文标题】长时间运行的 Azure WebJob 失败 - 客户端无法在指定的超时时间内完成操作【英文标题】:Long running Azure WebJob fails - The client could not finish the operation within specified timeout 【发布时间】:2020-01-17 22:45:11 【问题描述】:我有一个长时间运行的 Azure WebJob (2-4 小时),它在大约 90 分钟后一直失败,并出现存储异常。我正在使用 WebJobs 2.3.0 SDK 和 WindowsAzure.Storage 9.3.3。
[09/17/2019 05:14:23 > b0c2e2: ERR ]
[09/17/2019 05:14:23 > b0c2e2: ERR ] Unhandled Exception: Microsoft.WindowsAzure.Storage.StorageException: The client could not finish the operation within specified timeout. ---> System.TimeoutException: The client could not finish the operation within specified timeout.
[09/17/2019 05:14:23 > b0c2e2: ERR ] --- End of inner exception stack trace ---
[09/17/2019 05:14:23 > b0c2e2: ERR ] at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndExecuteAsync[T](IAsyncResult result) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Core\Executor\Executor.cs:line 51
[09/17/2019 05:14:23 > b0c2e2: ERR ] at Microsoft.WindowsAzure.Storage.Queue.CloudQueue.EndExists(IAsyncResult asyncResult) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Queue\CloudQueue.cs:line 994
[09/17/2019 05:14:23 > b0c2e2: ERR ] at Microsoft.WindowsAzure.Storage.Core.Util.AsyncExtensions.<>c__DisplayClass2`1.<CreateCallback>b__0(IAsyncResult ar) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Core\Util\AsyncExtensions.cs:line 69
[09/17/2019 05:14:23 > b0c2e2: ERR ] --- End of stack trace from previous location where exception was thrown ---
[09/17/2019 05:14:23 > b0c2e2: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[09/17/2019 05:14:23 > b0c2e2: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
[09/17/2019 05:14:23 > b0c2e2: ERR ] at Microsoft.Azure.WebJobs.Host.Queues.Listeners.QueueListener.<ExecuteAsync>d__25.MoveNext()
[09/17/2019 05:14:23 > b0c2e2: ERR ] --- End of stack trace from previous location where exception was thrown ---
[09/17/2019 05:14:23 > b0c2e2: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[09/17/2019 05:14:23 > b0c2e2: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
[09/17/2019 05:14:23 > b0c2e2: ERR ] at Microsoft.Azure.WebJobs.Host.Timers.TaskSeriesTimer.<RunAsync>d__14.MoveNext()
[09/17/2019 05:14:23 > b0c2e2: ERR ] --- End of stack trace from previous location where exception was thrown ---
[09/17/2019 05:14:23 > b0c2e2: ERR ] at Microsoft.Azure.WebJobs.Host.Timers.WebJobsExceptionHandler.<>c__DisplayClass3_0.<OnUnhandledExceptionAsync>b__0()
[09/17/2019 05:14:23 > b0c2e2: ERR ] at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
[09/17/2019 05:14:23 > b0c2e2: ERR ] at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
[09/17/2019 05:14:23 > b0c2e2: ERR ] at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
[09/17/2019 05:14:23 > b0c2e2: ERR ] at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
[09/17/2019 05:14:23 > b0c2e2: ERR ] at System.Threading.ThreadHelper.ThreadStart()
[09/17/2019 05:14:23 > b0c2e2: SYS ERR ] Job failed due to exit code -532462766
[09/17/2019 05:14:23 > b0c2e2: SYS INFO] Process went down, waiting for 0 seconds
[09/17/2019 05:14:23 > b0c2e2: SYS INFO] Status changed to PendingRestart
这项工作没有对 Azure 存储做任何事情,从我收集的其他问题来看,这可能与 WebJob 将日志文件写入 Azure 存储有关,为此我配置了一个具有较长服务器超时的自定义 StorageClientFactory 但这似乎没有区别。
工作配置:
var config = new JobHostConfiguration()
config.Queues.MaxDequeueCount = 1;
config.Queues.BatchSize = 1;
ServicePointManager.DefaultConnectionLimit = int.MaxValue;
config.StorageClientFactory = new CustomStorageClientFactory();
var host = new JobHost(config);
host.RunAndBlock();
public class CustomStorageClientFactory : StorageClientFactory
public override CloudBlobClient CreateCloudBlobClient(StorageClientFactoryContext context)
CloudBlobClient client = context.Account.CreateCloudBlobClient();
client.DefaultRequestOptions.ServerTimeout = TimeSpan.FromHours(6);
return client;
【问题讨论】:
也许您的 SAS 即将到期?我没有看到其他人有同样的问题 @4c74356b41 您指的是“共享访问签名”吗?如果是这样,那么我没有明确使用它们,任何对存储的访问都是使用AzureWebJobsDashboard
和AzureWebJobsStorage
中定义的连接字符串获得的。
@Phill 你有机会检查我的答案吗?有用吗?
@TomLuo 是的,谢谢,这很有帮助,很抱歉没有提供任何反馈。我设置了连接限制,似乎已经解决了问题。
【参考方案1】:
这应该是 Azure WebJob SDK 2.X 的设计问题。在后端,它使用 HttpWebRequest 访问存储 API。问题是默认情况下,每个服务器只允许 2 个并发连接。因此,如果 2 个 http 连接被其他请求占用,其他异步请求将超时。
解决方法是将 DefaultConnectionLimit 设置为更大的值,如下所示:
static void Main(string[] args)
// Set this immediately so that it's used by all requests.
ServicePointManager.DefaultConnectionLimit = Int32.MaxValue;
var host = new JobHost();
host.RunAndBlock();
您还可以简单地将 Azure Web SDK 升级到版本 3.x,从而解决了这个问题。
详情请参阅Managing concurrent connections 和https://github.com/Azure/azure-webjobs-sdk/issues/755#issuecomment-319094679。
【讨论】:
以上是关于长时间运行的 Azure WebJob 失败 - 客户端无法在指定的超时时间内完成操作的主要内容,如果未能解决你的问题,请参考以下文章
WinSCP 无法在 Azure 应用服务上作为 WebJob 运行