如何为此查询优化 Azure Cosmos 索引

Posted 2023-04-15

技术标签:

【中文标题】如何为此查询优化 Azure Cosmos 索引【英文标题】：How to optimize the Azure Cosmos index for this query 【发布时间】：2021-08-12 07:30:53 【问题描述】：

我有一个 Cosmos DB，大约。 10GB 用于存储分析数据的数据。

型号如下：


    "publisherID": "",
    "managerID": "",
    "mediaID": "",
    "type": "",
    "ip": "",
    "userAgent": "",
    "playerRes": "",
    "title": "",
    "playerName": "",
    "videoTimeCode": 0,
    "geo": 
        "country": "",
        "region": "",
        "city": "",
        "ll": []
    ,
    "date": "",
    "uuid": "",
    "id": ""

我有时会遇到非常繁重的查询，因为已达到我的 RU 限制。在考虑提高我的 RU 限制之前，我想确保我的查询已经过优化。

我的所有查询都遵循以下模式：

SELECT c.id,c.date,c.uuid,c.type FROM c WHERE c.mediaID = "ID" AND (c.type = "Load OR c.type = "Progress" OR c.type = "Play") AND (c.date BETWEEN "2021-06-30T22:00:00.000Z" AND "2021-07-31T21:59:59.999Z")

所以在做了一些研究之后，我得出的结论是我可以拥有的最佳索引是：


    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        
            "path": "/type/?"
        ,
        
            "path": "/mediaID/?"
        ,
        
            "path": "/date/?"
        
    ],
    "excludedPaths": [
        
            "path": "/*"
        
    ]

我得到了这个查询的以下统计数据：

Request Charge: 324.51000000000005 RUs
Showing Results: 1 - 100
Retrieved document count: 10024
Retrieved document size: 8597509 bytes
Output document count: 200
Output document size: 28324 bytes
Index hit document count: 199.24
Index lookup time: 2.41 ms
Document load time: 62.93 ms
Query engine execution time: 15.709800000000001 ms
System function execution time: 0 ms
User defined function execution time: 0 ms
Document write time: 0.47000000000000003 ms
Round Trips: 1

让我担心的是检索文档计数和输出文档计数之间的差异。我猜这就是为什么我需要 324 RU 才能获得前 100 个结果...

我不确定如何设置索引以优化查询性能（总是相同的模式：WHERE mediaID = ID AND type = TYPE AND date 2 个日期之间）

欢迎任何帮助。

【问题讨论】：

尝试创建/mediaID、/type 和date 的复合索引，并在创建索引后再次运行查询。考虑使用默认索引所有内容的退出策略，至少用于故障排除。这样您就可以排除您创建的策略的问题。 docs.microsoft.com/en-us/azure/cosmos-db/… 另外，你的分区键是什么？ @NoahStahl 虽然分区键对于扩展当然很重要，但我认为它对这个特定问题没有影响。只有 10GB 的数据将全部存储在单个物理分区中，因此跨分区查询没有影响。 【参考方案1】：

感谢您的反馈，这对您有很大帮助！

我添加了如下复合索引：

    "compositeIndexes": [
        [
            
                "path": "/mediaID",
                "order": "ascending"
            ,
            
                "path": "/type",
                "order": "ascending"
            ,
            
                "path": "/date",
                "order": "ascending"
            
        ]
    ]

我第一篇文章中初始请求的 RU 现在是 7 RU（之前是 320！）。谢谢@404

如果你不介意的话，我想再进一步... :)

对于另一个请求（相同的结构），我知道有很多数据需要检索，我需要 42.02 RU 才能获得前 100 个结果。有意义吗？

Request Charg 42.02 RUs
Showing Results 1 - 100
Retrieved document count 200
Retrieved document size 164847 bytes
Output document count 200
Output document size 28544 bytes
Index hit document count 200
Index lookup time 6.98 ms
Document load time 1.3399 ms
Query engine execution time 0.6601 ms
System function execution time 0 ms
User defined function execution time 0 ms
Document write time 0.47000000000000003 ms
Round Trips 1

除了增加 RU 限制之外，这里还有更多的优化可以做吗？

【讨论】：

请不要在答案中发布其他问题。我建议发布一个新问题。

以上是关于如何为此查询优化 Azure Cosmos 索引的主要内容，如果未能解决你的问题，请参考以下文章

如何在 Azure Cosmos DB 的一个查询中选择多个聚合值

Azure 函数：如何将 http 触发器函数的查询字符串参数绑定到 Cosmos DB 的 SQL 查询

如何使用 LINQ 针对 Azure Cosmos Document DB SQL API 有效地进行动态查询？

Azure Cosmos DB：使用 UpsertDocumentAsync 违反唯一索引约束

使用Cosmos Client返回Azure Cosmos项目的平面层次结构

Azure Cosmos DB 如何按一系列值进行分组