Azure Cosmos DB - '请求率很大。删除项目时可能需要更多请求单位的错误

Posted 2023-03-23

技术标签:

【中文标题】Azure Cosmos DB - \'请求率很大。删除项目时可能需要更多请求单位的错误【英文标题】：Azure Cosmos DB - 'Request rate is large. More Request Units may be needed' error while deleting the itemsAzure Cosmos DB - '请求率很大。删除项目时可能需要更多请求单位的错误 【发布时间】：2020-05-20 08:02:18 【问题描述】：

我正在使用以下存储过程从 cosmos db 集合中删除项目。

function bulkDeleteStoredProcedure(query) 
    var collection = getContext().getCollection();
    var collectionLink = collection.getSelfLink();
    var response = getContext().getResponse();
    var responseBody = 
        deleted: 0,
        continuation: true
    ;

    // Validate input.
    if (!query) throw new Error("The query is undefined or null.");

    tryQueryAndDelete();

    // Recursively runs the query w/ support for continuation tokens.
    // Calls tryDelete(documents) as soon as the query returns documents.
    function tryQueryAndDelete(continuation) 
        var requestOptions = continuation: continuation;

        var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) 
            if (err) throw err;

            if (retrievedDocs.length > 0) 
                // Begin deleting documents as soon as documents are returned form the query results.
                // tryDelete() resumes querying after deleting; no need to page through continuation tokens.
                //  - this is to prioritize writes over reads given timeout constraints.
                tryDelete(retrievedDocs);
             else if (responseOptions.continuation) 
                // Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
                tryQueryAndDelete(responseOptions.continuation);
             else 
                // Else if there are no more documents and no continuation token - we are finished deleting documents.
                responseBody.continuation = false;
                response.setBody(responseBody);
            
        );

        // If we hit execution bounds - return continuation: true.
        if (!isAccepted) 
            response.setBody(responseBody);
        
    

    // Recursively deletes documents passed in as an array argument.
    // Attempts to query for more on empty array.
    function tryDelete(documents) 
        if (documents.length > 0) 
            // Delete the first document in the array.
            var isAccepted = collection.deleteDocument(documents[0]._self, , function (err, responseOptions) 
                if (err) throw err;

                responseBody.deleted++;
                documents.shift();
                // Delete the next document in the array.
                tryDelete(documents);
            );

            // If we hit execution bounds - return continuation: true.
            if (!isAccepted) 
                response.setBody(responseBody);
            
         else 
            // If the document array is empty, query for more documents.
            tryQueryAndDelete();

执行此存储过程时出现以下错误：

无法为容器执行存储过程 BulkDelete 通知："code":429,"body":"code":"429","message":"Message: \"Errors\":[\"请求率很大。更多的请求单位可能是需要，所以没有进行任何更改。请稍后重试此请求。了解详情：http://aka.ms/cosmosdb-error-429\"]\r\nActivityId： cc616784-03ee-4b10-9481-d62c26e496e4，请求 URI： /apps/2268c937-d7b4-449e-9d76-a2d50d5d3546/services/df84607d-8553-4938-aa0d-913563078a93/partitions/b37017a9-ab2c-4a88-bb51-0ae729299a7e/7336/1382341 RequestStats: \r\nRequestStartTime: 2020-05-20T07:55:16.8899325Z, RequestEndTime：2020-05-20T07:55:17.5299234Z，区域数尝试：1\r\n响应时间：2020-05-20T07:55:17.5299234Z，存储结果：存储物理地址： rntbd://cdb-ms-prod-northeurope1-fd25.documents.azure.com:14307/apps/2268c937-d7b4-449e-9d76-a2d50d5d3546/services/df84607d-8553-4938-aa0d-913563078a93/partitions/b3701- ab2c-4a88-bb51-0ae729299a7e/replicas/132314907336368334p/, LSN：400340，GlobalCommittedLsn：400339，PartitionKeyRangeId：， IsValid：True，StatusCode：429，SubStatusCode：3200，RequestCharge： 0.38, ItemLSN: -1, SessionToken: , UsingLocalLSN: False, TransportException: null, ResourceType: StoredProcedure, 操作类型：Executejavascript\r\n，SDK： Microsoft.Azure.Documents.Common/2.11.0","headers":"access-control-allow-credentials":"true","access-control-allow-origin":"https://cosmos.azure.com","内容类型":"application/json","lsn":"400340","strict-transport-security":"max-age=31536000","x-ms-activity-id":"cc616784-03ee- 4b10-9481-d62c26e496e4","x-ms-cosmos-llsn":"400340","x-ms-cosmos-quorum-acked-llsn":"400340","x-ms-current-replica-set-大小":"4","x-ms-current-write-quorum":"3","x-ms-gatewayversion":"version=2.11.0","x-ms-global-committed-lsn" :"400339","x-ms-number-of-read-regions":"1","x-ms-quorum-acked-lsn":"400340","x-ms-request-charge":" 0.38","x-ms-retry-after-ms":"8538","x-ms-schemaversion":"1.9","x-ms-serviceversion":"version=2.11.0.0","x- ms-substatus":"3200","x-ms-transport-request-id":"120","x-ms-xp-role":"1","x-ms-throttle-retry-count" :5,"x-ms-throttle-retry-wait-time-ms":32087,"activityId":"cc616784-03ee-4b10-9481-d62c26e496e4","substatus":3200,"retryAfterInMs":8538

我该如何解决这个问题？存储过程有问题吗？

【问题讨论】：

你的 RU 设置是什么？如果它对于您尝试做的事情来说太低，您会看到节流。您会看到 8 秒的回退。这可能意味着您不断向 Cosmos DB 发送请求（查询和删除的某种组合），这导致您的退避时间不断增长（并且增长超过了存储过程的 5 秒执行时间）。 @DavidMakogon 吞吐量 RU 设置为 1000 【参考方案1】：

当当前聚合 RU + 查询的 RU 超过您设置的阈值时，CosmosDB 将返回 429。例如，如果您的阈值是 400，并且到目前为止您已经使用了 380 RU，并且下一次查询需要 22 RU 才能完成，那么 cosmos 将拒绝代码为 429 的查询。如果下一次查询只需要 3 RU，它将成功。 1 秒后，累计 RU 值清零，22 RU 查询成功。

如果您收到 429，您还将收到一个“x-ms-retry-after-ms”标头，其中包含一个数字。在重试查询之前，您应该等待该毫秒数。

https://docs.microsoft.com/en-us/rest/api/cosmos-db/common-cosmosdb-rest-response-headers

或者，您可以通过提高阈值来避免 429（这也会增加服务成本）。因此，您必须决定是要重试还是提高阈值。这取决于您的应用程序的性质。

RU 或资源单位由 CosmosDB 服务根据服务需要完成的工作量来计算。它是您的索引有多大、正在传输多少数据、您使用多少 CPU、磁盘、内存等的组合... 为 RU 收费是 Cosmos 了解您将要运行的工作负载的一种方式，并且根据需要进行必要的后端更改。 cosmos 的每秒成本取决于您的 RU 阈值设置。它还允许 cosmos 在后端进行必要的更改以适应您的性能需求。如果您在全球不同地区进行读写，则 RU 计算会变得更加复杂。

您可以通过重组索引中的数据来降低查询的 RU 成本。如果您要分散到多个分区，则查询将并行运行，从而在更短的时间内完成更多工作。如果您减少或增加在网络、内存和 cpu 组件中移动的千字节，它也会改变 RU。

【讨论】：

【参考方案2】：

429错误是请求过多引起的，不是你的存储过程错误。

但存储过程最适合写入繁重的操作，而不是读取或删除繁重的操作。相反，您可以使用 Bulk Executor Lib SDK，有 BulkDelete 功能。

这里是document。

【讨论】：

我认为将存储过程分类为专门针对写入繁重的操作是不公平的。也不确定您为什么要推荐一种与问题无关的替代方法。我们甚至对 OP 的查询或被删除的文档数量一无所知。根据错误输出，我们所知道的是有超过 8 秒的回退，这意味着它们可能会用请求淹没 Cosmos DB 并超出分配的 RU。【参考方案3】：

最简单的方法是增加 azure 门户中的吞吐量，取决于您执行此类操作的频率，您可以增加吞吐量并执行您的操作并重置它 - 如果它是一次性的 - 或找到您的最佳吞吐量。您应该四处寻找数字，但如果是一次尝试，请尝试 1000 并在每次尝试时乘以 10 直到它起作用，并且不要忘记将值重置回原来的值，否则您会收到一大笔账单:) （如果您的吞吐量设置为更多或大约 4000 并获得一些成本优势，您也可以考虑自动扩展它）

【讨论】：

以上是关于Azure Cosmos DB - '请求率很大。删除项目时可能需要更多请求单位的错误的主要内容，如果未能解决你的问题，请参考以下文章

Azure Cosmos DB 如何按一系列值进行分组

Azure Cosmos DB 知识整理

Azure Rest API 用于获取 Cosmos DB 帐户的 RU 指标

sql Azure Cosmos DB

Azure 流分析输出到 Azure Cosmos DB