Mongodb - 没有结果返回时性能不佳
Posted
技术标签:
【中文标题】Mongodb - 没有结果返回时性能不佳【英文标题】:Mongodb - poor performance when no results return 【发布时间】:2016-02-25 12:18:44 【问题描述】:我有大约 700 万个代表地点的文档的 Mongodb 集合。
我运行一个查询来搜索名称以特定位置附近的前缀开头的地方。
我们有一个如下所述的复合索引来加快搜索速度。
当搜索查询找到匹配项(即使只有一个)时,查询执行速度非常快(约 20 毫秒)。但是当没有匹配时,查询可能需要 30 秒才能执行。
请帮忙。
详细说明:
每个地方(geoData)都有以下字段:
"loc" - a GeoJSON point that represent the location
"categoriesIds" - array of int ids
"name" - the name of the placee
在这个集合上定义了以下索引:
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
查询是:
db.geoData.find(
"loc":
"$near":
"$geometry":
"type": "Point" ,
"coordinates": [ -0.10675191879272461 , 51.531600743186644]
,
"$maxDistance": 5000.0
,
"categoriesIds":
"$in": [ 1 , 2 , 71 , 70 , 74 , 72 , 73 , 69 , 44 , 26 , 27 , 33 , 43 , 45 , 53 , 79]
,
"name": "$regex": "^Cafe Ne"
)
执行统计 (Link to the whole explain result)
"executionStats" :
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 169,
"totalKeysExamined" : 14333,
"totalDocsExamined" : 1,
"executionStages" :
"stage" : "GEO_NEAR_2DSPHERE",
"nReturned" : 1,
"executionTimeMillisEstimate" : 60,
"works" : 14354,
"advanced" : 1,
"needTime" : 14351,
"needFetch" : 0,
"saveState" : 361,
"restoreState" : 361,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" :
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
,
"indexName" : "loc_2dsphere_categoriesIds_1_name_1",
"searchIntervals" : [
"minDistance" : 0,
"maxDistance" : 3408.329295346151,
"maxInclusive" : false
,
"minDistance" : 3408.329295346151,
"maxDistance" : 5000,
"maxInclusive" : true
],
"inputStages" : [
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 20,
"works" : 6413,
"advanced" : 1,
"needTime" : 6411,
"needFetch" : 0,
"saveState" : 361,
"restoreState" : 361,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" :
"stage" : "IXSCAN",
"filter" :
"TwoDSphereKeyInRegionExpression" : true
,
"nReturned" : 1,
"executionTimeMillisEstimate" : 20,
"works" : 6413,
"advanced" : 1,
"needTime" : 6411,
"needFetch" : 0,
"saveState" : 361,
"restoreState" : 361,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" :
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
,
"indexName" : "loc_2dsphere_categoriesIds_1_name_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" :
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230031\"]",
"[\"2f10032300311\", \"2f10032300312\")",
"[\"2f10032300312\", \"2f10032300313\")",
"[\"2f10032300313\", \"2f10032300314\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")"
],
"categoriesIds" : [
"[1.0, 1.0]",
"[2.0, 2.0]",
"[26.0, 26.0]",
"[27.0, 27.0]",
"[33.0, 33.0]",
"[43.0, 43.0]",
"[44.0, 44.0]",
"[45.0, 45.0]",
"[53.0, 53.0]",
"[69.0, 69.0]",
"[70.0, 70.0]",
"[71.0, 71.0]",
"[72.0, 72.0]",
"[73.0, 73.0]",
"[74.0, 74.0]",
"[79.0, 79.0]"
],
"name" : [
"[\"Cafe Ne\", \"Cafe Nf\")",
"[/^Cafe Ne/, /^Cafe Ne/]"
]
,
"keysExamined" : 6412,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 1
,
"stage" : "FETCH",
"nReturned" : 0,
"executionTimeMillisEstimate" : 40,
"works" : 7922,
"advanced" : 0,
"needTime" : 7921,
"needFetch" : 0,
"saveState" : 261,
"restoreState" : 261,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 0,
"alreadyHasObj" : 0,
"inputStage" :
"stage" : "IXSCAN",
"filter" :
"TwoDSphereKeyInRegionExpression" : true
,
"nReturned" : 0,
"executionTimeMillisEstimate" : 40,
"works" : 7922,
"advanced" : 0,
"needTime" : 7921,
"needFetch" : 0,
"saveState" : 261,
"restoreState" : 261,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" :
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
,
"indexName" : "loc_2dsphere_categoriesIds_1_name_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" :
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300011\", \"2f10032300012\")",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230032\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")",
"[\"2f10032300322\", \"2f10032300323\")"
],
"categoriesIds" : [
"[1.0, 1.0]",
"[2.0, 2.0]",
"[26.0, 26.0]",
"[27.0, 27.0]",
"[33.0, 33.0]",
"[43.0, 43.0]",
"[44.0, 44.0]",
"[45.0, 45.0]",
"[53.0, 53.0]",
"[69.0, 69.0]",
"[70.0, 70.0]",
"[71.0, 71.0]",
"[72.0, 72.0]",
"[73.0, 73.0]",
"[74.0, 74.0]",
"[79.0, 79.0]"
],
"name" : [
"[\"Cafe Ne\", \"Cafe Nf\")",
"[/^Cafe Ne/, /^Cafe Ne/]"
]
,
"keysExamined" : 7921,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0
]
,
搜索“CafeNeeNNN”而不是“Cafe Ne”时的执行统计信息 (Link to the whole explain result)
"executionStats" :
"executionSuccess" : true,
"nReturned" : 0,
"executionTimeMillis" : 2537,
"totalKeysExamined" : 232259,
"totalDocsExamined" : 162658,
"executionStages" :
"stage" : "FETCH",
"filter" :
"$and" : [
"name" : /^CafeNeeNNN/
,
"categoriesIds" :
"$in" : [
1,
2,
26,
27,
33,
43,
44,
45,
53,
69,
70,
71,
72,
73,
74,
79
]
]
,
"nReturned" : 0,
"executionTimeMillisEstimate" : 1330,
"works" : 302752,
"advanced" : 0,
"needTime" : 302750,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 70486,
"alreadyHasObj" : 70486,
"inputStage" :
"stage" : "GEO_NEAR_2DSPHERE",
"nReturned" : 70486,
"executionTimeMillisEstimate" : 1290,
"works" : 302751,
"advanced" : 70486,
"needTime" : 232264,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" :
"loc" : "2dsphere"
,
"indexName" : "loc_2dsphere",
"searchIntervals" : [
"minDistance" : 0,
"maxDistance" : 3408.329295346151,
"maxInclusive" : false
,
"minDistance" : 3408.329295346151,
"maxDistance" : 5000,
"maxInclusive" : true
],
"inputStages" : [
"stage" : "FETCH",
"nReturned" : 44540,
"executionTimeMillisEstimate" : 110,
"works" : 102690,
"advanced" : 44540,
"needTime" : 58149,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 44540,
"alreadyHasObj" : 0,
"inputStage" :
"stage" : "IXSCAN",
"filter" :
"TwoDSphereKeyInRegionExpression" : true
,
"nReturned" : 44540,
"executionTimeMillisEstimate" : 90,
"works" : 102690,
"advanced" : 44540,
"needTime" : 58149,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" :
"loc" : "2dsphere"
,
"indexName" : "loc_2dsphere",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" :
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230031\"]",
"[\"2f10032300311\", \"2f10032300312\")",
"[\"2f10032300312\", \"2f10032300313\")",
"[\"2f10032300313\", \"2f10032300314\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")"
]
,
"keysExamined" : 102689,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 44540
,
"stage" : "FETCH",
"nReturned" : 47632,
"executionTimeMillisEstimate" : 250,
"works" : 129571,
"advanced" : 47632,
"needTime" : 81938,
"needFetch" : 0,
"saveState" : 2556,
"restoreState" : 2556,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 47632,
"alreadyHasObj" : 0,
"inputStage" :
"stage" : "IXSCAN",
"filter" :
"TwoDSphereKeyInRegionExpression" : true
,
"nReturned" : 47632,
"executionTimeMillisEstimate" : 230,
"works" : 129571,
"advanced" : 47632,
"needTime" : 81938,
"needFetch" : 0,
"saveState" : 2556,
"restoreState" : 2556,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" :
"loc" : "2dsphere"
,
"indexName" : "loc_2dsphere",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" :
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300011\", \"2f10032300012\")",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230032\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")",
"[\"2f10032300322\", \"2f10032300323\")"
]
,
"keysExamined" : 129570,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 47632
]
,
集合上的索引
"0" :
"v" : 1,
"key" :
"_id" : 1
,
"name" : "_id_",
"ns" : "wego.geoData"
,
"1" :
"v" : 1,
"key" :
"srcId" : 1
,
"name" : "srcId_1",
"ns" : "wego.geoData"
,
"2" :
"v" : 1,
"key" :
"loc" : "2dsphere"
,
"name" : "loc_2dsphere",
"ns" : "wego.geoData",
"2dsphereIndexVersion" : 2
,
"3" :
"v" : 1,
"key" :
"name" : 1
,
"name" : "name_1",
"ns" : "wego.geoData"
,
"4" :
"v" : 1,
"key" :
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
,
"name" : "loc_2dsphere_categoriesIds_1_name_1",
"ns" : "wego.geoData",
"2dsphereIndexVersion" : 2
,
"5" :
"v" : 1,
"key" :
"loc" : "2dsphere",
"categoriesIds" : 1,
"keywords" : 1
,
"name" : "loc_2dsphere_categoriesIds_1_keywords_1",
"ns" : "wego.geoData",
"2dsphereIndexVersion" : 2
Collection stats link
【问题讨论】:
您能否同时为这两个查询发布 explain() 的“queryPlanner”部分? 我添加了指向整个“解释”结果的链接 根据新的完整解释文件,两个查询花费相同的时间(大约 1800 毫秒)。您能否发布该集合的所有索引(在您发布的 executionStats 中使用了不同的索引,但在您链接的文件中没有使用)? 您始终可以尝试使用“提示”强制使用特定的索引(如果 indexFilterSet 为假,就像您的情况一样,否则提示将被忽略)。能否检查一下执行时间是否与此一致? 抱歉,我为“Cafe Ne”查询设置了一个指向错误“解释”输出的链接。现在是正确的。 【参考方案1】:我将在这里推测一下,然后对您的设计发表评论。
首先,当您在键上创建一个索引时,该索引在一个值上具有一个数组,您为该数组的每个元素创建一条记录:
为了索引一个包含数组值的字段,MongoDB 创建一个索引 数组中每个元素的键。
这是来自MongoDB own documentation about indecies。
所以,如果您的典型记录不仅仅是一堆类别,而且您有 700 万条记录, 您的索引很大,而且扫描索引本身以发现索引不包含您要查找的内容也需要时间。它还是 比集合扫描快,但与查找现有记录所需的速度相比,它非常慢。
现在,让我评论一下您的架构设计。这是一个风格问题,所以请随意忽略这部分。
您的记录可能属于 17 个类别。这有点压倒性,并且滥用了category
这个词。类别是特定的
除法,一种快速将一个事物与一组事物联系起来的方法。属于这么多群体的东西是什么?
让我们以您的记录Cafe Ne
为例。我假设在现实世界中——请记住,当解决现实世界的问题时,编程和应用程序充其量是——Cafe Ne,要么是餐厅,要么是咖啡馆,要么是爵士酒吧,
晚餐。它肯定不是车库(除非,咖啡馆在我不懂的语言中是指汽车)。我很难想象这是一家银行或一家牙科诊所。我必须非常努力地找到 10 多个有意义的类别,供用户搜索咖啡馆。
我的意思是,尽管 mongodb 允许你设计这样的东西,但这并不意味着你必须这样做。尝试减少您拥有的类别数量和您正在寻找的类别,您将获得更好的性能。
【讨论】:
索引大小不是问题。对于现有文档,响应返回非常快。问题是当它没有通过索引找到文档时,它会扫描数万个文档而不是放弃并且不返回任何结果。 关于每个文档的类别数量,平均每个文档大约有1.1个类别,因此它不会使索引膨胀太多。当文档在查询中具有至少一个列表类别时,“$in”条件为真 @Eliezer,如果您的搜索模糊,搜索索引也需要时间。您可以发布索引的大小吗? 我在问题末尾添加了指向 geoData.stats() 输出的链接。 “loc_2dsphere_categoriesIds_1_name_1”索引大约是“id”索引的 2.5 倍 其实我做了一些测试。对于使用this mgenerate template 生成的文档,索引条目的平均大小为 3732 字节,加起来高达 26.1GB 的索引大小。我认为这解决了这个子问题。在@Oz123 上扩展我猜想工作集(又名“现有”文档)可能包含要返回的大多数文档,而光标(懒惰地加载文档,iirc)只是在后台读取剩余的文档。没有匹配,找到第一个文档后没有游标可以返回,所以我们有一些磁盘操作【参考方案2】:正如 JohnnyHK 在 cmets 中建议的那样,Oz123 在他的回答中指出,这里的问题似乎是一个索引已经变得如此之大,以至于它无法作为索引表现良好。我相信除了已经指出的类别扩展问题之外,您的索引中的字段排序也会造成麻烦。复合索引是built according to the order of fields,将name
放在categoriesIds
之后会使查询name
的成本更高。
很明显,您需要调整索引。究竟如何调整它们取决于您期望支持的查询类型。特别是,我不确定您是否会从 loc
和 name
的复合索引中看到更好的性能,或者您是否会从单个索引中看到更好的性能,一个是 loc
,一个是 name
. Mongo 自己是a little vague 关于何时最好使用复合索引以及何时最好使用单个索引并依赖索引交集。
我的直觉是单个索引的性能会更好,但我会测试这两种情况。
如果您预计还需要按类别查询,而没有可能缩小查询范围的name
或loc
字段,最好创建一个单独的categoriesIds
索引。
【讨论】:
【参考方案3】:复合索引中字段的顺序非常重要。如果无法访问真实数据和使用模式,就很难进行诊断,但是这个键可能会增加仅使用索引匹配(或不匹配)文档的几率:
"loc" : "2dsphere",
"name" : 1,
"categoriesIds" : 1
【讨论】:
【参考方案4】:不确定是否是完全相同的问题,但我们遇到了类似的问题:多键索引在未找到结果时性能不佳。
这实际上是一个在 v3.3.8 中修复的 Mongo 错误。 https://jira.mongodb.org/browse/SERVER-15086
我们在升级 Mongo 并重建索引后修复了我们的问题。
【讨论】:
以上是关于Mongodb - 没有结果返回时性能不佳的主要内容,如果未能解决你的问题,请参考以下文章