随机 Azure Function Apps 故障:超出主机阈值 [连接数]

Posted

技术标签:

【中文标题】随机 Azure Function Apps 故障:超出主机阈值 [连接数]【英文标题】:Random Azure Function Apps failures: Host thresholds exceeded [Connections] 【发布时间】:2018-09-09 13:39:21 【问题描述】:

我有以下功能应用

[FunctionName("SendEmail")]
public static async Task Run([ServiceBusTrigger("%EmailSendMessageQueueName%", AccessRights.Listen, Connection = AzureFunctions.Connection)] EmailMessageDetails messageToSend,
    [ServiceBus("%EmailUpdateQueueName%", AccessRights.Send, Connection = AzureFunctions.Connection)]IAsyncCollector<EmailMessageUpdate> messageResponse,
    //TraceWriter log,
    ILogger log,
    CancellationToken token)

    log.LogInformation($"C# ServiceBus queue trigger function processed message: messageToSend");

    /* Validate input and initialise Mandrill */
    try
    
        if (!ValidateMessage(messageToSend, log))   // TODO: finish validation
        
            log.LogError("Invalid or Unknown Message Content");
            throw new Exception("Invalid message content.");
        
    
    catch (Exception ex)
    
        log.LogError($"Failed to Validate Message data: ex.Message => ex.ReportAllProperties()");
        throw;
    

    DateTime utcTimeToSend;
    try
    
        var envTag = GetEnvVariable("Environment");
        messageToSend.Tags.Add(envTag);

        utcTimeToSend = messageToSend.UtcTimeToSend.GetNextUtcSendDateTime();
        DateTime utcExpiryDate = messageToSend.UtcTimeToSend.GetUtcExpiryDate();
        DateTime now = DateTime.UtcNow;
        if (now > utcExpiryDate)
        
            log.LogError($"Stopping sending message because it is expired: utcExpiryDate");
            throw new Exception($"Stopping sending message because it is expired: utcExpiryDate");
        
        if (utcTimeToSend > now)
        
            log.LogError($"Stopping sending message because it is not allowed to be send due to time constraints: next send time: utcTimeToSend");
            throw new Exception($"Stopping sending message because it is not allowed to be send due to time constraints: next send time: utcTimeToSend");
        
    
    catch (Exception ex)
    
        log.LogError($"Failed to Parse and/or Validate Message Time To Send: ex.Message => ex.ReportAllProperties()");
        throw;
    

    /* Submit message to Mandrill */
    string errorMessage = null;
    IList<MandrillSendMessageResponse> mandrillResult = null;
    DateTime timeSubmitted = default(DateTime);
    DateTime timeUpdateRecieved = default(DateTime);

    try
    
        var mandrillApi = new MandrillApi(GetEnvVariable("Mandrill:APIKey"));

        var mandrillMessage = new MandrillMessage
        
            FromEmail = messageToSend.From,
            FromName = messageToSend.FromName,
            Subject = messageToSend.Subject,
            TrackClicks = messageToSend.Track,
            Tags = messageToSend.Tags,
            TrackOpens = messageToSend.Track,
        ;

        mandrillMessage.AddTo(messageToSend.To, messageToSend.ToName);
        foreach (var passthrough in messageToSend.PassThroughVariables)
        
            mandrillMessage.AddGlobalMergeVars(passthrough.Key, passthrough.Value);
        

        timeSubmitted = DateTime.UtcNow;
        if (String.IsNullOrEmpty(messageToSend.TemplateId))
        
            log.LogInformation($"No Message Template");
            mandrillMessage.Text = messageToSend.MessageBody;
            mandrillResult = await mandrillApi.Messages.SendAsync(mandrillMessage, async: true, sendAtUtc: utcTimeToSend);
        
        else
        
            log.LogInformation($"Using Message Template: messageToSend.TemplateId");
            var clock = new Stopwatch();
            clock.Start();
            mandrillResult = await mandrillApi.Messages.SendTemplateAsync(
                mandrillMessage,
                messageToSend.TemplateId,
                async: true,
                sendAtUtc: utcTimeToSend
            );
            clock.Stop();
            log.LogInformation($"Call to mandrill took clock.Elapsed");
        
        timeUpdateRecieved = DateTime.UtcNow;
    
    catch (Exception ex)
    
        log.LogError($"Failed to call Mandrill: ex.Message => ex.ReportAllProperties()");
        errorMessage = ex.Message;
    

    try
    
        MandrillSendMessageResponse theResult = null;
        SendMessageStatus status = SendMessageStatus.FailedToSendToProvider;

        if (mandrillResult == null || mandrillResult.Count < 1)
        
            if (String.IsNullOrEmpty(errorMessage))
            
                errorMessage = "Invalid Mandrill result.";
            
        
        else
        
            theResult = mandrillResult[0];
            status = FacMandrillUtils.ConvertToSendMessageStatus(theResult.Status);
        

        var response = new EmailMessageUpdate
        
            SentEmailInfoId = messageToSend.SentEmailInfoId,
            ExternalProviderId = theResult?.Id ?? String.Empty,
            Track = messageToSend.Track,
            FacDateSentToProvider = timeSubmitted,
            FacDateUpdateRecieved = timeUpdateRecieved,
            FacErrorMessage = errorMessage,
            Status = status,
            StatusDetail = theResult?.RejectReason ?? "Error"
        ;

        await messageResponse.AddAsync(response, token).ConfigureAwait(false);
    
    catch (Exception ex)
    
        log.LogError($"Failed to push message to the update (AzureFunctions.EmailUpdateQueueName) queue: ex.Message => ex.ReportAllProperties()");
        throw;
    

当我将 100 条消息排队时,一切正常。当我排队 500 多条消息时,其中 499 条已发送,但最后一条从未发送。我也开始出现以下错误。

操作已取消。

我已设置和配置 Application Insights,并且正在运行日志记录。我无法在本地重现,并且基于 Application Insights 中的以下端到端事务详细信息,我相信此时问题正在发生:

await messageResponse.AddAsync(response, token).ConfigureAwait(false);

Application Insights 端到端事务

host.json


  "logger": 
    "categoryFilter": 
      "defaultLevel": "Information",
      "categoryLevels": 
        "Host": "Warning",
        "Function": "Information",
        "Host.Aggregator": "Information"
      
    
  ,
  "applicationInsights": 
    "sampling": 
      "isEnabled": true,
      "maxTelemetryItemsPerSecond": 5
    
  ,
  "serviceBus": 
    "maxConcurrentCalls": 32
  

可能与 Application Insights 中的此错误有关。

[

还有其他人遇到过这个或类似的问题吗?

【问题讨论】:

【参考方案1】:

如果您点击异常https://aka.ms/functions-thresholds 的链接,您将看到以下限制:

Connections :出站连接数(限制为 300)。有关处理连接限制的信息,请参阅管理连接。

你很可能已经击中了那个。

在每个函数调用中,您都会创建一个 MandrillApi 的新实例。你没有提到你正在使用哪个库,但我怀疑它正在为MandrillApi 的每个实例创建一个新连接。

我检查了Mandrill Dot Net,是的,it's creating 每个实例都有一个新的HttpClient

_httpClient = new HttpClient

    BaseAddress = new Uri(BaseUrl)
;

Managing Connections推荐:

在许多情况下,可以通过重用客户端实例而不是在每个函数中创建新实例来避免这种连接限制。如果您使用单个静态客户端,则 .NET 客户端(如 HttpClient、DocumentClient 和 Azure 存储客户端)可以管理连接。如果这些客户端在每次函数调用时都被重新实例化,那么代码很可能会泄漏连接。

如果 API 客户端是线程安全的,请检查该库的文档,如果是,则在函数调用之间重用它。

【讨论】:

实际上,当我发布此内容时,我确实注意到了,所以我看了看。在典型的 Microsoft 方式中,文档有点模糊。这 300 个连接是在 1 秒内、在 1 分钟内,还是全部并发?此外,调用内部 Azure 服务是否也计入这些连接(调用 Azure 服务总线是我的问题)。 就您的观点而言,我正在重构一点以尝试解释 Mandrill 的 API 调用。 @JasonH 我相信并发打开连接,一切都很重要

以上是关于随机 Azure Function Apps 故障:超出主机阈值 [连接数]的主要内容,如果未能解决你的问题,请参考以下文章

Azure 架构师学习笔记-Azure Logic Apps- Logic Apps调用ADF

什么时候使用哪个库? azure-mobile-apps-js-client 和 cordova-plugin-ms-azure-mobile-apps

Azure 架构师学习笔记-Azure Logic Apps-演示2

Azure 架构师学习笔记-Azure Logic Apps-演示2

Azure 架构师学习笔记-Azure Logic Apps-组件介绍

Azure 架构师学习笔记-Azure Logic Apps-组件介绍