MongoDB 进阶(GridFS)

Posted 常飞梦

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MongoDB 进阶(GridFS)相关的知识,希望对你有一定的参考价值。

??

GridFS是一种在MongoDB中存储大二进制文件的机制。使用GridFS存文件有如下几个原因:

 GridFS可以简化需求。如果已经用了MongoDB,GridFS就可以不需要独立的文件存储架构。

 GridFS利用已经建立的复制和分片机制,所以对于文件存储来说故障恢复和扩展都很容易。

 GridFS可以避免用于存储用户上传内容的文件系统出现的某些问题。例如:GridFS在同一目录下放置大量文件是没有任何问题的。

 GridFS不产生磁片,因为MongoDB分配的数据文件空间以2G为一块。

 

使用GridFS:mongofiles

mongofilesGridFS的实用工具,用于管理GridFS文件

 

--帮助命令

[[email protected] ~]# mongofiles--help

Browse and modify a GridFSfilesystem.

 

usage: mongofiles [options]command [gridfs filename]

command:

  one of (list|search|put|get)

  list - list all files.  ‘gridfs filename‘ is an optional prefix

         which listed filenames must beginwith.

  search - search all files. ‘gridfs filename‘is a substring

           which listed filenames must contain.

  put - add a file with filename ‘gridfsfilename‘

  get - get a file with filename ‘gridfsfilename‘

  delete - delete all files with filename‘gridfs filename‘

options:

  --help                                produce helpmessage

  -v [ --verbose ]                      be more verbose (includemultiple times

                                        formore verbosity e.g. -vvvvv)

  --version                             print theprogram‘s version and exit

  -h [ --host ] arg                     mongo host to connect to (<set

                                       name>/s1,s2 for sets)

  --port arg                            server port. Can also use --host

                                       hostname:port

  --ipv6                                enable IPv6support (disabled by

                                       default)

  -u [ --username ] arg                 username

  -p [ --password ] arg                 password

  --authenticationDatabase arg          user source (defaults to dbname)

  --authenticationMechanism arg (=MONGODB-CR)

                                        authentication mechanism

  --dbpath arg                          directly accessmongod database files

                                        in thegiven path, instead of

                                       connecting to a mongod  server -needs

                                        to lockthe data directory, so cannot

                                        be usedif a mongod is currently

                                       accessing the same path

  --directoryperdb                      each db is in a separate directly

                                       (relevant only if dbpath specified)

  --journal                             enable journaling(relevant only if

                                        dbpathspecified)

  -d [ --db ] arg                       database to use

  -c [ --collection ] arg               collection to use (somecommands)

  -l [ --local ] arg                    local filename for put|get(default is

                                        to usethe same name as ‘gridfs

                                       filename‘)

  -t [ --type ] arg                     MIME type for put (defaultis to omit)

  -r [ --replace ]                      Remove other files withsame name after

                                        PUT

                                       

--上传文件

[[email protected] ~]# mongofiles put foo.log

connected to: 127.0.0.1

added file: { _id:ObjectId(‘56caba480ad7ef0aa8a76f0c‘), filename: "foo.log", chunkSize:261120, uploadDate: new Date(1456126536618), md5:"d1bfff5ab0cc6b652aaf08345b19b7e6", length: 21 }

done!

--列出文件

[[email protected] ~]# mongofiles list

connected to: 127.0.0.1

install.log     54876

foo.log 21

--下载文件

[[email protected] ~]# rm -f foo.log

[[email protected] ~]# mongofiles get foo.log

connected to: 127.0.0.1

done write to: foo.log

[[email protected] ~]# ll foo.log

-rw-r--r--. 1 root root 21 2  22 15:36 foo.log

--Gridfs中删除一个文件

[[email protected] ~]# mongofiles deleteinstall.log

connected to: 127.0.0.1

done!

[[email protected] ~]# mongofiles list

connected to: 127.0.0.1

foo.log 21

 

Gridfs内部原理

Gridfs的基本思想就是可以将大文件分成很多块,每块作为一个单独的文档存储,这样就能存大文件了。它一个建立在普通MongoDB文档基础上轻量级文件规范。

由于MongoDB支持在文档存储二进制数据,可以最大限度减少块的存储开销。另外,除了存储文件本身的块,还有一个单独的文档用来存储分块的信息和文件的元数据。

 

Gridfs的块有个单独的fs.chunks集合(默认),块集合的文档结构如下:

{

"_id" : ObjectId("..."),

"n" : 0,

"data" :BinData("..."),

"files_id" :ObjectId("...")

}

 

  _id:块的唯一ID

 files_id:包含这个块元数据的文件文档的id

 n:表示块编号,也就是这个块在原文件中顺序编号

 data:包含组成文件块的二进制数据

 

> db.fs.chunks.find()

{ "_id" :ObjectId("56caba48e0355316e5e4ab39"), "files_id" :ObjectId("56caba480ad7ef0aa8a76f0c"), "n" : 0,"data" : BinData(0,"SGVsbG8gTW9uZ29EQiBHcmlkZnMK") }

{ "_id" :ObjectId("56cabb85e0355316e5e4ab3a"), "files_id" :ObjectId("56cabb85d07cdd46e1f143a4"), "n" : 0,"data" : BinData(0,"SGVsbG8gTW9uZ29EQiBHcmlkZnMK") }

{ "_id" :ObjectId("56cabb89e0355316e5e4ab3b"), "files_id" :ObjectId("56cabb895c03f6feeb64bb6e"), "n" : 0,"data" :BinData(0,"5a6J6KOFIGxpYmdjYy00LjQuNy00LmVsNi54ODZfNjQKd2FybmluZzogbGliZ2NjLTQuNC43LTQuZWw2Lng4Nl82NDogSGVhZGVyIFYzIFJTQS9TSEEyNTYgU2lnbmF0dXJlLCBrZXkgSUQgZWM1NTFmMDM6IE5PS0VZCuWuieijhSBmb250cGFja2FnZXMtZmlsZXN5c3RlbS0xLjQxLTEuMS5lbDYu

......

--查询返回指定字段

>db.fs.chunks.find({},{"files_id":1,"n":1})

{ "_id" :ObjectId("56caba48e0355316e5e4ab39"), "files_id" :ObjectId("56caba480ad7ef0aa8a76f0c"), "n" : 0 }

{ "_id" :ObjectId("56cabb85e0355316e5e4ab3a"), "files_id" :ObjectId("56cabb85d07cdd46e1f143a4"), "n" : 0 }

{ "_id" :ObjectId("56cabb89e0355316e5e4ab3b"), "files_id" : ObjectId("56cabb895c03f6feeb64bb6e"),"n" : 0 }

 

 

Gridfs文件的元数据放在fs.files集合(默认)。这里没每个文档代表GridFS中的一个文件,与文件相关的自定义元数据也可以存在其中。

> db.fs.files.find()

{ "_id" :ObjectId("56caba480ad7ef0aa8a76f0c"), "filename" :"foo.log", "chunkSize" : 261120, "uploadDate" :ISODate("2016-02-22T07:35:36.618Z"), "md5" :"d1bfff5ab0cc6b652aaf08345b19b7e6", "length" : 21 }

{ "_id" :ObjectId("56cabb85d07cdd46e1f143a4"), "filename" :"foo.log", "chunkSize" : 261120, "uploadDate" :ISODate("2016-02-22T07:40:53.015Z"), "md5" :"d1bfff5ab0cc6b652aaf08345b19b7e6", "length" : 21 }

{ "_id" :ObjectId("56cabb895c03f6feeb64bb6e"), "filename" :"install.log", "chunkSize" : 261120, "uploadDate": ISODate("2016-02-22T07:40:57.387Z"), "md5" :"fbe1119cd9688d14475e2a84ccd8a7a6", "length" : 54876 }

 

 _id 文件的唯一id,在块中作为files_id键值存储

 length 文件内容总的字节数

 chunkSize 每块的大小(字节),默认是256K,必要时可调整

 uploadDate文件存入GridFS时间戳

 以上是关于MongoDB 进阶(GridFS)的主要内容,如果未能解决你的问题,请参考以下文章

MongoDB GridFS

MongoDB GridFS

Node.js 文件上传(Express 4、MongoDB、GridFS、GridFS-Stream)

使用 gridfs 和 mongoose 在 mongodb 中存储文件

使用 GridFs 从 mongoDB 读取和显示图像

遍历结果 MongoDB 和 GridFS (PHP)