索引及集合管理 - MongoDB从入门到删库

Posted 2023-04-07

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了索引及集合管理 - MongoDB从入门到删库相关的知识，希望对你有一定的参考价值。

参考技术A MongoTemplate 提供了一些用于管理索引和集合的方法。这些方法被收集到一个名为 IndexOperations 的帮助接口中。您可以通过调用 indexOps 方法并传入集合名或实体的 java.lang.Class 来访问这些操作（集合名称来自 .class ，或名称，或注释元数据）。

您可以使用 MongoTemplate 类在集合上创建索引，以提高查询性能，还可以使用 IndexDefinition 、 GeoSpatialIndex 和 TextIndexDefinition 类创建标准、地理空间和文本索引。

IndexOperations 接口具有 getIndexInfo 方法，该方法返回 IndexInfo 对象的列表。此列表包含在集合上定义的所有索引。下面的例子定义了Person类上具有age属性的索引:

下面的例子展示了如何创建一个集合:

MongoDB索引管理

一、索引介绍

在mongodb中，索引用来支持高效查询。如果没有索引，mongodb必须在整个集合中扫描每个文档来查找匹配的文档。但是如果建立合适的索引，mongodb就可以通过索引来限制检查的文档数量。

索引是一种特殊的数据结构，它存储着集合中小部分的数据集，这种数据结构很容易遍历。索引存储着指定的字段或字段集合，这些字段都是根据字段值排序的。排序的索引条目能够支持高效的等值匹配和基于范围的查询操作，此外，mongodb通过排序索引还能够返回排好序的结果集。

从根本上来说，mongodb的索引类似于其他关系型数据库的索引，它被定义在集合层面并支持任何字段或子域，它采用B-tree数据结构。

二、索引概念

1、索引类型

MongoDB提供多种不同类型的索引。对于某一文档或者内嵌文档，你能够在任意字段或者内嵌字段上创建索引。一般而言，你应该创建通用的面向用户的索引。通过这些索引，确保mongodb扫描最少最有可能匹配的文档。在mongodb的shell中，你能通过调用createIndex()方法创建一个索引。

1）单字段索引

对于集合中的文档，mongodb完全支持在任何字段上创建索引。默认地，任何集合的_id字段上都有一个索引，并且应用和用户还可以添加额外的索引来支持重要的查询和操作。mongodb既支持单字段索引也支持多个字段的复合索引，这里先介绍单字段索引，下面请看举例说明：

> db.friends.insert({"name" :"Alice","age":27}) #集合friends中的一个文档
WriteResult({ "nInserted" : 1 })

> db.friends.createIndex({"name" :1}) #在文档的name字段上建索引
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 1,
    "numIndexesAfter" : 2,
    "ok" : 1
}

db.collection.createIndex(keys,options)介绍：

Parameter Type Description

Parameter	Type	Description
`keys`	document	A document that contains the field and value pairs where the field is the index key and the value describes the type of index for that field. For an ascending index on a field, specify a value of `1`; for descending index, specify a value of `-1`. MongoDB supports several different index types including text, geospatial, and hashed indexes. See Index Types for more information.
`options`	document	Optional. A document that contains a set of options that controls the creation of the index. See Optionsfor details.

keys

document

A document that contains the field and value pairs where the field is the index key and the value describes the type of index for that field. For an ascending index on a field, specify a value of 1; for descending index, specify a value of -1.

MongoDB supports several different index types including text, geospatial, and hashed indexes. See Index Types for more information.

options document Optional. A document that contains a set of options that controls the creation of the index. See Optionsfor details.

_id字段上索引：当一个集合被创建时，默认的会在_id字段上创建一个升序唯一索引，这个索引是不能被删除的。考虑到_id字段是一个集合的主键，所以对于集合中每个文档都应该有一个唯一的_id字段，在该字段中，你能存储任意的唯一的值。_id字段默认的值是ObjectId,它是在插入文档时被自动生成。在分片集合环境中，如果你没有指定_id字段为shard key,那么你的应用程序必须确保_id字段值得唯一性，否则会报错。常用的做法是：通过自动生成ObjectId标准值解决。
内嵌字段索引：在内嵌文档的任意字段上，你也可以创建索引，就如同在文档的一级字段上创建一样。不过，需要说明的是，在内嵌字段上创建索引和在内嵌文档上创建索引是有区别的，前者通过点号的方式访问内嵌文档中的字段名。请看下面的例子：

> db.people.insert(
... {
...   name:"John Doe",
...   address: {
...      street: "Main",
...      zipcode:"53511",
...      state: "WI"
... }
... }
... )
WriteResult({ "nInserted" : 1 })
 
> db.people.createIndex({"address.zipcode":1})  #通过address.zipcode方法引用zipcode字段，注意需要加上双引号。

{ "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 }

内嵌文档索引：

> db.factories.insert(
      { metro:
               {   
                    city: "New York",   
                    state:"NY" 
               }, 
          name: "Giant Factory" 
       })
WriteResult({ "nInserted" : 1 })
> db.factories.createIndex({metro:1})
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 1,
    "numIndexesAfter" : 2,
    "ok" : 1
}

上面的metro字段是内嵌文档，包含内嵌字段city和state，所以创建方法同一级字段创建方法一样。

下面的查询能够用到该索引：

db.factories.find( { metro: { city: "New York", state: "NY" } } )

{ "_id" : ObjectId("56189565d8624fafa91cbbc1"), "metro" : { "city" : "New York", "state" : "NY" }, "name" : "Giant Factory" }

在内嵌文档进行等值匹配查询时，需要注意字段的顺序，例如下面的查询将匹配不到任何文档：

> db.factories.find( { metro: { state: "NY", city: "New York" } } )
> 
>

2）复合索引

MongoDB支持组合索引，所谓组合索引，就是包含多个字段的索引，一个组合索引最多可以包含31个字段。如果字段属于hash索引，这时组合索引不能包括该字段。

实例说明：

> db.products.insert(
    {"item": "Banana",
      "category": ["food","produce","grocery"],
      "location": "4th Street Store",
      "stock": 4,
      "type": "cases",
      "arrival": "2015"}
      )
WriteResult({ "nInserted" : 1 })

> db.products.createIndex({"item":1,"stock":1}) #创建复合索引，包含item和stock两个字段
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 1,
    "numIndexesAfter" : 2,
    "ok" : 1
}

这时下面两个查询是可以用到该复合索引的：

> db.products.find({"item":"Banana"})
> db.products.find({"item":"Banana","stock":4})

【排序次序】

索引中的字段可以按照升序(1)和降序(-1)排序，对于组合索引，字段的排序次序是很重要的，它直接影响排序操作能否用上索引。

下面举例说明：events集合中文档的字段为username和date

按username升序date降序查询：

db.events.find().sort( { username: 1, date: -1 } )

按username降序date升序查询：

db.events.find().sort( { username: -1, date: 1 } )

按username和date升序：

db.events.find().sort( { username: 1, date: 1 } )

以上几种查询，由于排序次序不同，利用索引的情况也不一样，现在创建如下索引：

db.events.createIndex( { "username" : 1, "date" : -1 } )

只有第一第二两种情况能够走该索引，第三种是无法利用该索引的。

【复合索引前缀】

所谓前缀是复合索引开始字段的子集，如考虑下面的组合索引：

{ "item": 1, "location": 1, "stock": 1 }

这时前缀可以有：

•{ item: 1 }
•{ item: 1, location: 1 }

下面的查询可以很好的利用该组合索引：

•the item field,                                              #只有一个item条件，以前缀开始
•the item field and the location field,                       #两个匹配条件，以前缀开始
•the item field and the location field and the stock field.   #全部匹配条件，包含前缀

而下面的查询则无法利用该组合索引：

•the location field,
•the stock field, or
•the location and stock fields.

对于一个集合中有组合索引也有一个单字段索引，如{a:1,b:1},{a:1},由于组合索引前缀包括{a:1}索引，故第二个是冗余的，可以删除掉。

【总结】：mongodb中的组合索引利用条件和其他关系数据库的组合索引基本相同。

3）多键索引(multikey index)

对于字段是array类型的，mongodb将为array中每个元素创建索引，多键索引为数组字段提供高效查询。

创建多键索引

db.collections.createIndex( { <field>: < 1 or -1 > } )

如果字段是array类型的，在创建索引时，mongodb自动创建的是多键索引，无需我们显式的指定索引类型。

多键索引边界

索引扫描的边界规定了在查询期间利用索引搜索的数据范围，当一个索引上存在多个谓词时,mongodb会试图将这些谓词通过交叉索引或组合索引的方式进行合并，以产生比较小的范围边界。

多键索引的限制

对于一个复合索引，每个文档中多键索引里最多允许1个字段是数组类型，反过来，如果一个多键索引已存在，你将不能插入一个文档，该文档拥有两个数组类型的字段。举例说明：

{ _id: 1, a: [ 1, 2 ], b: [ 1, 2 ], category: "AB - both arrays" }    #不能创建index { a: 1, b: 1 }，因为这个索引中有两个数组类型的字段

下面这种情况是允许创建index { a: 1, b: 1 }的：

{ _id: 1, a: [1, 2], b: 1, category: "A array" }
{ _id: 2, a: 1, b: [1, 2], category: "B array" }

Shard Keys

你不能指定一个多键索引作为分片key索引。哈希索引也不能是多键。

数组字段中是内嵌文档

你可以创建多键索引包含内嵌对象的情况：

{
  _id: 1,
  item: "abc",
  stock: [
    { size: "S", color: "red", quantity: 25 },
    { size: "S", color: "blue", quantity: 10 },
    { size: "M", color: "blue", quantity: 50 }
  ]
}
{
  _id: 2,
  item: "def",
  stock: [
    { size: "S", color: "blue", quantity: 20 },
    { size: "M", color: "blue", quantity: 5 },
    { size: "M", color: "black", quantity: 10 },
    { size: "L", color: "red", quantity: 2 }
  ]
}
{
  _id: 3,
  item: "ijk",
  stock: [
    { size: "M", color: "blue", quantity: 15 },
    { size: "L", color: "blue", quantity: 100 },
    { size: "L", color: "red", quantity: 25 }
  ]
}

然后创建一个多键索引：

db.inventory.createIndex( { "stock.size": 1, "stock.quantity": 1 } )

下面的查询和排序都可以走这个索引的：

db.inventory.find( { "stock.size": "M" } )
db.inventory.find( { "stock.size": "S", "stock.quantity": { $gt: 20 } } )
db.inventory.find( ).sort( { "stock.size": 1, "stock.quantity": 1 } )
db.inventory.find( { "stock.size": "M" } ).sort( { "stock.quantity": 1 } )

4）地理空间索引(Geospatial Indexes)

MongoDB专门提供一组索引和查询机制来处理地理空间信息，下面介绍mongodb中地理空间特性。

在存储地理空间信息数据之前，你需要决定使用哪种平面类型进行计算。你选择的类型将影响你如何存储数据、建立何种索引以及查询的语法。MongoDB提供两种平面类型：

曲面：为了计算球面几何体，你需要存储你的数据到曲面类型中并选择2dsphere索引。将数据作为GeoJSON对象并按照坐标轴顺序存储。

平面：为了计算欧几里得平面距离，将数据作为坐标对存储并采用2d索引。

2dsphere Indexes

为了创建一个基于GeoJSON数据格式的空间索引，使用db.collections.createIndex()方法来新建一个2dsphere索引，语法如下：

db.collection.createIndex( { <location field> : "2dsphere" } )

下面进行详细演示：

首先，创建一个基于位置的集合places,该集合中存储基于GeoJSON Point的位置数据文档，如下：

db.places.insert(
   {
      loc : { type: "Point", coordinates: [ -73.97, 40.77 ] },
      name: "Central Park",
      category : "Parks"
   }
)

db.places.insert(
   {
      loc : { type: "Point", coordinates: [ -73.88, 40.78 ] },
      name: "La Guardia Airport",
      category : "Airport"
   }
)

然后，新建一个基于loc字段的2dsphere索引：

db.places.createIndex( { loc : "2dsphere" } )

当然，还可以创建包含2dsphere索引的复合索引：

db.places.createIndex( { loc : "2dsphere" , category : -1, name: 1 } )
db.places.createIndex( { category : 1 , loc : "2dsphere" } )   #不像2d索引，这里不要求2dsphere类型在第一个位置

【注意事项】：2dsphere索引的字段必须是基于坐标对和GeoJSON数据格式。

2d Indexes

该索引类型用于数据作为点存储在二维平面情景下，一般用在v2.2版本之前的基于坐标对数据格式的，这里不详细介绍。

geoHaystack Indexes

geoHaystack索引是一种特别的索引，一般被优化用来返回小区域的结果集。当用平面几何体存储数据形式时，用geoHaystack可以提高查询性能。而对于使用曲面几何体，2dsphere索引将是更好的选择，它允许字段重新排序，而geoHaystack要求第一个字段必须是位置字段。具体这里不详细介绍。

2d Index Internals：不常用，请查考官方文档

5）散列索引

散列索引维护着索引字段的散列值条目，散列函数能够折叠内置文档并计算整个值的散列数，它不支持多键索引。

MongoDB的散列索引支持等值查询，不支持基于范围的查询。你不能够创建一个包含散列索引字段的复合索引，也不能指定一个唯一索引在散列索引上，但是你可以在同一个字段上创建散列索引和单字段索引。

下面是创建散列索引的例子：

db.collection.createIndex( { _id: "hashed" } )

6）全文索引

MongoDB提供全文索引来支持文本字符串的查询效率，它可以建立在任何是字符串类型或元素为字符串数组的字段上。一个集合最多可以有一个全文索引。

创建全文索引

db.reviews.createIndex( { comments: "text" } )

当然你也可以创建一个包括多个字段的全文索引，也就是复合索引可以包括全文索引：

db.reviews.createIndex(
   {
     subject: "text",
     comments: "text"
   }
 )

【指定权重】：权重是指全文索引字段之间的比率，比如下面的content：10，keywords：5，表示content在查询中出现2次，keywords才出现1次。

db.blog.createIndex(
   {
     content: "text",
     keywords: "text",
     about: "text"
   },
   {
     weights: {
       content: 10,
       keywords: 5
     },
     name: "TextIndex"
   }
 )

【通配符】：创建全文索引时还可以利用通配符：

db.collection.createIndex( { "$**": "text" } )
db.collection.createIndex( { a: 1, "$**": "text" } )

用这种方式创建的全文索引，只要是集合中存在字符串类型的字段全部加入到全文索引中，这种常常用在非结构化数据并且不确定字段的情况下。

限制

集合最多有一个全文索引
如果查询中包含$test表达式，不能用hint()函数
不支持排序操作
如果全文索引包含在复合索引中，那么，这个复合索引不能包括多键和空间索引

2、索引属性

MongoDB中除了支持上面索引类型外，还提供了一些常用的索引属性。

1）TTL indexes

TTL索引是一类特殊的单字段索引，用来自动的删除集合中过期的数据。数据期限对于像机器码生成、日志及session等数据是很有用的。

创建TTL索引

db.eventlog.createIndex( { "lastModifiedDate": 1 }, { expireAfterSeconds: 3600 } )   #通过加上expireAfterSeconds定义TTL索引

TTL过期原理

从索引字段值开始，过去了指定的秒数之后，TTL索引使文档过期。而过期的阈值等于索引值加上指定的秒数。如果索引字段是数组类型的，那么在TTL索引上会存在多个时间值，mongodb会采用最早的时间值进行阈值计算。如果索引字段不是date类型的，那么文档永不过期；如果文档不包含TTL索引，那么文档也永不过期。

mongodb会启动后台TTL线程每隔60秒读取索引的值并且删除过期的数据,当TTL线程的状态是active，你可以通过db.currentOp()查看到删除操作。TTL索引并不保证当数据过期时立刻被删除，可能会有一段时间的延迟。

限制

1、TTL索引是单字段索引，不支持组合索引。
2、_id字段不支持TTL索引
3、在capped集合中不支持TTL索引
4、对已存在的TTL索引，你不能通过createIndex方法修改expireAfterSeconds的值，而是通过collMod命令连同索引集合标志。要不然，只能删掉重建。
5、对于原来已存在的非TTL的单域索引，你不能再在该字段上建TTL索引，为了把非TTL索引转为TTL索引，这时你必须删除原索引重建。

2）Unique indexes（唯一索引）

对于加了唯一索引的字段，mongodb将会拒绝插入该字段重复值的所有文档。默认创建索引时，唯一索引参数时禁用的。

创建唯一索引

db.members.createIndex( { "user_id": 1 }, { unique: true } )

对于组合索引，如果unique是true的话，那么唯一性由这些字段的组合值确定。

如果唯一索引字段是数组或内置文档类型的，唯一索引并不保证里面的值是否唯一，如下面例子：

db.collection.createIndex( { "a.b": 1 }, { unique: true } )
db.collection.insert( { a: [ { b: 5 }, { b: 5 } ] } )   #这时完全可以插入

如果对于唯一索引字段没有值，那么默认存储null值。由于唯一性约束，mongodb仅仅允许一次不包括该索引字段的插入，如果大于1次，那么就会报错。举例说明：

首先在x字段上创建唯一索引：

db.collection.createIndex( { "x": 1 }, { unique: true } )

其次，执行不带x字段的插入语句：

db.collection.insert( { y: 1 } )    #这时可以插入，因为在插入之前集合中还不包括值为null的x

然后，再执行一次不包含x的插入语句：

db.collection.insert( { z: 1 } )    #这时报错，由于之前插入了值为null的x
WriteResult({
    "nInserted" : 0,
    "writeError" : {
        "code" : 11000,
        "errmsg" : "E11000 duplicate key error collection: test.collection index: x_1 dup key: { : null }"
    }
})

3）Partial Indexes（局部索引）

局部索引仅仅是为集合中某些满足指定过滤条件的文档建立的索引。通过对集合中部分文档建立索引，故局部索引对存储、索引创建和维护性能成本有较低的要求。

创建局部索引

db.restaurants.createIndex(
   { cuisine: 1, name: 1 },
   { partialFilterExpression: { rating: { $gt: 5 } } }
)

可选参数partialFilterExpression适用于所有的索引类型。

利用局部索引的条件

1 查询谓词必须包含过滤表达式
2 查询条件必须是局部索引结果集的本身或子集

对于上面的索引，下面举例几个查询，通过利用条件，看是否能用上该局部索引：

1、db.restaurants.find( { cuisine: "Italian", rating: { $gte: 8 } } )  #可以走局部索引，因为查询表达式的结果集是局部索引结果集的子集
2、db.restaurants.find( { cuisine: "Italian" } )                       #无法利用局部索引，因为不满足条件1：查询谓词中没有过滤表达式rating
3、db.restaurants.find( { cuisine: "Italian", rating: { $lt: 8 } } )   #无法利用局部索引，因为走索引会导致不完整的结果集

带唯一性约束的局部索引

对于带唯一性约束的局部索引，这个唯一性约束只在满足局部索引的范围文档中有效，而对于不走局部索引的，唯一性约束不起作用。

{ "_id" : ObjectId("56424f1efa0358a27fa1f99a"), "username" : "david", "age" : 29 }
{ "_id" : ObjectId("56424f37fa0358a27fa1f99b"), "username" : "amanda", "age" : 35 }
{ "_id" : ObjectId("56424fe2fa0358a27fa1f99c"), "username" : "rajiv", "age" : 57 }

db.users.createIndex(
   { username: 1 },
   { unique: true, partialFilterExpression: { age: { $gte: 21 } } }
)
#以下三个唯一性约束可以有作用
db.users.insert( { username: "david", age: 27 } )
db.users.insert( { username: "amanda", age: 25 } )
db.users.insert( { username: "rajiv", age: 32 } )
#以下唯一性不起作用
db.users.insert( { username: "david", age: 20 } )
db.users.insert( { username: "amanda" } )
db.users.insert( { username: "rajiv", age: null } )

4）Sparse Indexes（稀疏索引）

所谓稀疏索引就是仅包含有索引字段的文档条目，即使字段的值为null。由于该索引会跳过没有索引字段的文档，故取名“稀疏索引”。稀疏索引不包括所有集合中的文档，相反的，非稀疏索引则包括集合中所有文档。

在mongodb3.2及以后的版本中，推荐优先使用部分索引。

创建稀疏索引：

db.addresses.createIndex( { "xmpp_id": 1 }, { sparse: true } )

如果使用稀疏索引导致得到不完整的结果集，那么mongodb将不用改索引除非显式地用hint()函数指定使用。下面是使用例子：

{ "_id" : ObjectId("523b6e32fb408eea0eec2647"), "userid" : "newbie" }
{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }
{ "_id" : ObjectId("523b6e6ffb408eea0eec2649"), "userid" : "nina", "score" : 90 }

db.scores.createIndex( { score: 1 } , { sparse: true } )
db.scores.find( { score: { $lt: 90 } } )  #由于userid=newbie没有score字段，这样就不满足稀疏索引的条件，故只返回下面一个文档

{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }

对于上面的集合，再来看个排序：

db.scores.find().sort( { score: -1 } )  #尽管该排序是在score索引字段上，但是mongodb不会选择稀疏索引，这样就可以返回完整的结果集
{ "_id" : ObjectId("523b6e6ffb408eea0eec2649"), "userid" : "nina", "score" : 90 }
{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }
{ "_id" : ObjectId("523b6e32fb408eea0eec2647"), "userid" : "newbie" }

#为了指定使用稀疏索引，必须显式用hint方法
db.scores.find().sort( { score: -1 } ).hint( { score: 1 } )

{ "_id" : ObjectId("523b6e6ffb408eea0eec2649"), "userid" : "nina", "score" : 90 }
{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }

对于带唯一约束的稀疏索引：

唯一性约束只能在满足稀疏索引的文档上有作用，在其他文档上都没作用，如下：

{ "_id" : ObjectId("523b6e32fb408eea0eec2647"), "userid" : "newbie" }
{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }
{ "_id" : ObjectId("523b6e6ffb408eea0eec2649"), "userid" : "nina", "score" : 90 }

db.scores.createIndex( { score: 1 } , { sparse: true, unique: true } )
#下面四个可以进行插入
db.scores.insert( { "userid": "AAAAAAA", "score": 43 } )
db.scores.insert( { "userid": "BBBBBBB", "score": 34 } )
db.scores.insert( { "userid": "CCCCCCC" } )
db.scores.insert( { "userid": "DDDDDDD" } )
#下面违反唯一性约束
db.scores.insert( { "userid": "AAAAAAA", "score": 82 } )
db.scores.insert( { "userid": "BBBBBBB", "score": 90 } )

以上是关于索引及集合管理 - MongoDB从入门到删库的主要内容，如果未能解决你的问题，请参考以下文章

《遇见狂神说》MySQL从入门到删库

每篇半小时1天入门MongoDB——4.MongoDB索引介绍及数据库命令操作

Mongodb基本操作入门,增删改查和索引

MongoDB快速入门教程（3.2）

MongoDB索引管理

ES快速入门，ElasticSearch 搜索引擎