C# Httpclient Asp.net Core 2.0 on Kestrel Waiting, Lock Contention, high CPU

Posted 2023-03-30

技术标签:

【中文标题】C# Httpclient Asp.net Core 2.0 on Kestrel Waiting, Lock Contention, high CPU【英文标题】： 【发布时间】：2017-09-07 00:56:49 【问题描述】：

我有一个非常简单的 API，它有一个单例实例，其中有许多 HTTPClients 在整个应用程序中重用。 API被调用，我创建了一个任务列表，每个任务调用一个客户端。我使用 CancellationToken 和另一个任务来严格超时。

public class APIController : Controller

    private IHttpClientsFactory _client;
    public APIController(IHttpClientsFactory client)
    
        _client = client;
    

    [HttpPost]
    public async Task<IActionResult> PostAsync()
    
        var cts = new CancellationTokenSource();
        var allTasks = new ConcurrentBag<Task<Response>>();
        foreach (var name in list)//30 clients in list here
        
            allTasks.Add(CallAsync(_client.Client[name], cts.Token));
        
        cts.CancelAfter(1000);
        await Task.WhenAny(Task.WhenAll(allTasks), Task.Delay(1000));
        //do something with allTasks

CallAsync 也很简单，只需使用客户端调用并等待应答即可。

 var response = await client.PostAsync(endpoint, content, token);

现在这段代码完美运行，1 秒后它返回，然后将取消请求发送到任何尚未返回的任务。任务列表大约有 30 个客户端，因此 API 每次调用 30 个端点，平均响应时间为 800 毫秒。

此应用程序每秒管理 3000 个并发调用，因此大约 每秒完成 100k Httpclient 调用。

问题在于 HttpClient 中存在一些瓶颈，实际上 CPU 总是非常高，我需要大约 80 个（八十个）具有 32GB RAM 的 16 核虚拟机来处理流量。显然有问题。

我得到的一个提示是，在将我的 nugget 包更新到 Asp.net Core 2 之前，完全相同的代码执行得更好。

我在服务器上进行了诊断，我的代码中没有任何问题，但似乎 HttpClients 客户端有点相互等待或卡住了。

跟踪中确实没有其他内容。我正在使用工厂为每个端点创建单个实例：

  public class HttpClientsFactory : IHttpClientsFactory

    public static Dictionary<string, HttpClient> HttpClients  get; set; 

    public HttpClientsFactory()
    
        HttpClients = new Dictionary<string, HttpClient>();
        Initialize();
    

    private static void Initialize()
    
        HttpClients.Add("Name1", CreateClient("http://......"));  
        HttpClients.Add("Name2", CreateClient("http://...."));
        HttpClients.Add("Name3", CreateClient("http://...."));

    

    public Dictionary<string, HttpClient> Clients()
    
        return HttpClients;
    

    public HttpClient Client(string key)
    
        try
        
            return Clients()[key];
        
        catch
        
            return null;
        
    

    public static HttpClient CreateClient(string endpoint)
    
        try
        
            var config = new HttpClientHandler()
            
                MaxConnectionsPerServer = int.MaxValue,
                AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
            ;
            var client = new HttpClient(config)
            
                Timeout = TimeSpan.FromMilliseconds(1000),
                BaseAddress = new Uri(endpoint)
            ;

            client.DefaultRequestHeaders.Accept.Clear();
            client.DefaultRequestHeaders.Connection.Clear();
            client.DefaultRequestHeaders.ExpectContinue = false;
            client.DefaultRequestHeaders.ConnectionClose = false;
            client.DefaultRequestHeaders.Connection.Add("Keep-Alive");
            client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));

            return client;
        
        catch (Exception)
        
            return null;

然后在启动中

 services.AddSingleton<IHttpClientsFactory, HttpClientsFactory>();

这里发生了什么，HttpClient 的单例不适合这种情况吗？我应该为每个线程创建一个 HttpClient 实例吗？我该怎么做？

更新

经过几天的测试，我确信 HTTPClient 调用期间的超时会使一些连接处于打开状态，从而导致端口耗尽。关于如何避免这种情况的任何建议？

【问题讨论】：

我最初的想法是客户端太多，并且已知会导致问题。 HttpClient 旨在让一个客户端在应用程序的整个生命周期中重复使用。从工厂方法来看，除了基本 url 之外，客户端之间似乎没有太大区别。与 HttpClient 的传出连接数存在限制：请查看此博客条目 - blogs.msdn.microsoft.com/timomta/2017/10/23/… 【参考方案1】：

由于 HTTP 的设计工作方式，您似乎已经达到了操作系统对 HTTP 请求的处理能力的极限。如果您使用的是 .NET 框架（而不是 dotnet 核心），那么您可以使用 ServicePointManager 调整操作系统管理 HTTP 请求的方式。大多数 HTTP 服务器和客户端使用keep-alive，这意味着它们将为多个 HTTP 请求重用相同的 TCP 连接。您可以使用ServicePointManager.FindServicePoint() 函数获取ServicePoint 实例以连接到特定主机。对同一主机的每个 HTTP 请求都使用相同的 ServicePoint 实例。

尝试调整ServicePoint 中的一些值，看看它如何影响您的应用程序。例如ConnectionLimit，它控制客户端和服务器之间将使用多少并行 TCP 连接。虽然keep-alive 允许您将现有的 TCP 连接重用于新的 HTTP 请求，而pipelining（使用SupportsPipelining 检查您连接的服务器是否支持此功能）允许您同时发送多个请求，但 HTTP 需要响应与请求的顺序相同。这意味着最初的缓慢响应将阻止所有后续请求/响应。通过拥有多个 TCP 连接，您可以并行处理多个非阻塞 HTTP 请求。但当然也有不利的一面，因为您现在有多个 TCP 连接，它们必须相互争夺网络资源。所以，调整这个值，看看它是否有所改善，但要小心！

所以如上所述，这里真正的问题可能是 HTTP 协议以及它如何一次只能真正处理一个请求。幸运的是，在 HTTP/2 中有一个解决方案，不幸的是，asp.net 还没有很好地支持它。我还没有尝试过，因为我工作的系统还不支持它，但理论上它应该允许您并行发送和接收多个 HTTP 请求而不会阻塞。看看this thread，它描述了一种让它工作的方法。

编辑

我完全错过了您在 asp.net core 2.0 上运行。在这种情况下，你don't have access to ServicePointManager。但是如果你只需要在 Windows 上运行，那么你可以安装 WinHttpHandler nuget 并设置 MaxConnectionsPerServer 属性。但是，如果您使用的是 WinHttpHandler，那么我建议您尝试 HTTP/2，看看这是否会为您带来改善。

EDIT2

我们刚刚发现了一个问题，即 POST 请求花费的时间是 GET 请求的两倍，但请求的其余部分完全相同。在将 localhost 连接到本地服务器时，我们从异地位置发现了该问题，因此 ping 比平时高得多。这表明在执行 POST 时进行了两次往返，而在执行 GET 时仅进行了一次往返。解决方案是这两行：

ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;

也许这对你也有帮助？

【讨论】：

不幸的是，我已经在使用 HttpClientHandler() => MaxConnectionsPerServer = int.MaxValue 它位于底部的第二个代码块中。我降低了虚拟机的功率，而是增加了数量（从 80 个 D5V2 十六核到 120 个 D4V2 八核）并且看起来好一点，也许我真的达到了操作系统的限制，即使看起来这么低，CPU 还没有结束15%，一切都变慢了，或者我在 Chrome 控制台 ERR_CONNECTION_RESET 中看到。我会调查 HTTP/2，谢谢【参考方案2】：

如果您查看 HttpClient 的 code，您可以看到如果 Timeout 属性不等于 Threading.Timeout.InfiniteTimeSpan，则 HttpClient 会为每个请求创建带有超时的 CancellationTokenSource：

        CancellationTokenSource cts;
        bool disposeCts;
        bool hasTimeout = _timeout != s_infiniteTimeout;
        if (hasTimeout || cancellationToken.CanBeCanceled)
        
            disposeCts = true;
            cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken, _pendingRequestsCts.Token);
            if (hasTimeout)
            
                cts.CancelAfter(_timeout);

查看 CancelAfter() 的 code 我们可以看到它内部是 creates System.Threading.Timer 对象：

public void CancelAfter(Int32 millisecondsDelay)
    
        ThrowIfDisposed();

        if (millisecondsDelay < -1)
        
            throw new ArgumentOutOfRangeException("millisecondsDelay");
        

        if (IsCancellationRequested) return;

        // There is a race condition here as a Cancel could occur between the check of
        // IsCancellationRequested and the creation of the timer.  This is benign; in the 
        // worst case, a timer will be created that has no effect when it expires.

        // Also, if Dispose() is called right here (after ThrowIfDisposed(), before timer
        // creation), it would result in a leaked Timer object (at least until the timer
        // expired and Disposed itself).  But this would be considered bad behavior, as
        // Dispose() is not thread-safe and should not be called concurrently with CancelAfter().

        if (m_timer == null)
        
            // Lazily initialize the timer in a thread-safe fashion.
            // Initially set to "never go off" because we don't want to take a
            // chance on a timer "losing" the initialization ---- and then
            // cancelling the token before it (the timer) can be disposed.
            Timer newTimer = new Timer(s_timerCallback, this, -1, -1);
            if (Interlocked.CompareExchange(ref m_timer, newTimer, null) != null)
            
                // We lost the ---- to initialize the timer.  Dispose the new timer.
                newTimer.Dispose();
            
        


        // It is possible that m_timer has already been disposed, so we must do
        // the following in a try/catch block.
        try
        
            m_timer.Change(millisecondsDelay, -1);
        
        catch (ObjectDisposedException)
        
            // Just eat the exception.  There is no other way to tell that
            // the timer has been disposed, and even if there were, there
            // would not be a good way to deal with the observe/dispose
            // race condition.

在创建 Timer 对象时，它 takes a lock 在单例 TimerQueue.Instance 上

    internal bool Change(uint dueTime, uint period)
    
        bool success;

        lock (TimerQueue.Instance)
        
            if (m_canceled)
                throw new ObjectDisposedException(null, Environment.GetResourceString("ObjectDisposed_Generic"));

            // prevent ThreadAbort while updating state
            try  
            finally
            
                m_period = period;

                if (dueTime == Timeout.UnsignedInfinite)
                
                    TimerQueue.Instance.DeleteTimer(this);
                    success = true;
                
                else
                
                    if (FrameworkEventSource.IsInitialized && FrameworkEventSource.Log.IsEnabled(EventLevel.Informational, FrameworkEventSource.Keywords.ThreadTransfer))
                        FrameworkEventSource.Log.ThreadTransferSendObj(this, 1, string.Empty, true);

                    success = TimerQueue.Instance.UpdateTimer(this, dueTime, period);
                
            
        

        return success;

如果您有大量并发 HTTP 请求，您可能会遇到 lock convoy 问题。这个问题在here 和here 进行了描述。

要确认这一点，请尝试使用 VS 中的 Parallel Stacks 窗口调试您的代码，或使用 SOS 扩展中的 syncblock WinDbg 命令。

【讨论】：

有一个类似的问题，大量并发请求超时，通过 HttpClientFactory 锁定整个应用程序，这就是原因。我们消除了超时，问题完全消失了。【参考方案3】：

尝试设置MaxConnectionsPerServer = Environment.ProcessorCount。

使用MaxConnectionsPerServer = int.MaxValue，您的程序会创建/使用太多线程，这些线程会相互竞争 CPU 资源。因此，在线程之间切换加上线程开销会降低性能。

【讨论】：

以上是关于C# Httpclient Asp.net Core 2.0 on Kestrel Waiting, Lock Contention, high CPU的主要内容，如果未能解决你的问题，请参考以下文章