数据存储之非关系型数据库存储----MongoDB存储
Posted liyihua
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据存储之非关系型数据库存储----MongoDB存储相关的知识,希望对你有一定的参考价值。
MongoDB存储----文档型数据库
-
利用pymongo连接MongoDB
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) # 或 pymongo.MongoClient(‘mongodb://localhost:23017/‘) # 默认端口为:27017
# pymongo.MongoClient()方法
-
指定数据库
# 指定操作test数据库
# db = client.test 或 db = client[‘test‘] -
指定集合
# 指定一个集合要操作的集合students
# collection = db.students 或 collection = db[‘students‘] -
插入数据
import pymongo # 连接MongoDB client = pymongo.MongoClient(host=‘localhost‘, port=27017) # 指定数据库 db = client.test # 指定集合 collection = db.students # 数据 student = ‘id‘: ‘20180001‘, ‘name‘: ‘Jordan‘, ‘age‘: 20, ‘gender‘: ‘male‘ # 利用insert_one()方法插入一条数据 result = collection.insert_one(student) print(result) # 运行输出:<pymongo.results.InsertOneResult object at 0x11089b448> # 在MongoDB中,每条数据其实都有一个_id属性来唯一标识。如果没有显式指明该属性,MongoDB会自动产生一个ObjectId类型的_id属性。 # 使用 insert_one()和 insert_many()方法来分别插入单条记录和多条记录
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) db = client.test collection = db.students student1 = ‘id‘: ‘20180002‘, ‘name‘: ‘Lee Hua‘, ‘age‘: 20, ‘gender‘: ‘male‘ student2 = ‘id‘: ‘20180003‘, ‘name‘: ‘Mike‘, ‘age‘: 21, ‘gender‘: ‘male‘ result = collection.insert_many([student1, student2]) print(result) print(result.inserted_ids) # 调用inserted_ids属性可以获取数据的_id列表 # 运行输出: <pymongo.results.InsertManyResult object at 0x110826d88> [ObjectId(‘5d28b293e834575faf929428‘), ObjectId(‘5d28b293e834575faf929429‘)]
# insert_one()方法 和 insert_many()方法
-
查询
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) db = client.test collection = db.students result = collection.find_one(‘name‘: ‘Lee Hua‘) print(result) # 输出: ‘_id‘: ObjectId(‘5d28b293e834575faf929428‘), ‘id‘: ‘20180002‘, ‘name‘: ‘Lee Hua‘, ‘age‘: 20, ‘gender‘: ‘male‘
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) db = client.test collection = db.students result = collection.find() print(result) for r in result: print(r) # find()方法返回一个迭代器,用for循环逐条输出 # 输出结果: <pymongo.cursor.Cursor object at 0x10e0f7320> ‘_id‘: ObjectId(‘5d28ae0360105a198d9d501a‘), ‘id‘: ‘20180001‘, ‘name‘: ‘Jordan‘, ‘age‘: 20, ‘gender‘: ‘male‘ ‘_id‘: ObjectId(‘5d28ae2d8b3d004feb604874‘), ‘id‘: ‘20180001‘, ‘name‘: ‘Jordan‘, ‘age‘: 20, ‘gender‘: ‘male‘ ‘_id‘: ObjectId(‘5d28b293e834575faf929428‘), ‘id‘: ‘20180002‘, ‘name‘: ‘Lee Hua‘, ‘age‘: 20, ‘gender‘: ‘male‘ ‘_id‘: ObjectId(‘5d28b293e834575faf929429‘), ‘id‘: ‘20180003‘, ‘name‘: ‘Mike‘, ‘age‘: 21, ‘gender‘: ‘male‘
# find_one()方法 和 find()方法
可以在这两个方法里面添加条件,如:
find(
‘name‘: ‘$regex‘: ‘^M.*‘
)
这里查找的是以‘M‘开头的名字的那些数据,$regex指定的是正则表达式,
^M.*是一条正则表达式
更多功能符号(如$regex)、数值比较符号的查看MongoDB官方文档:https://docs.mongodb.com/?searchProperty=manual
-
计数
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) db = client.test collection = db.students count = collection.count_documents( ‘id‘: ‘$regex‘: ‘^(2018)‘ ) print(count) # 输出id为2018开头的数据的条数
# collection.count_documents(条件) 方法
-
排序
1 import pymongo 2 3 client = pymongo.MongoClient(host=‘localhost‘, port=27017) 4 db = client.test 5 collection = db.students 6 7 result = collection.find().sort(‘id‘, pymongo.ASCENDING) 8 for r in result: 9 print(r) 10 11 12 # 以id升序输出所有的数据: 13 ‘_id‘: ObjectId(‘5d28ae0360105a198d9d501a‘), ‘id‘: ‘20180001‘, ‘name‘: ‘Jordan‘, ‘age‘: 20, ‘gender‘: ‘male‘ 14 ‘_id‘: ObjectId(‘5d28ae2d8b3d004feb604874‘), ‘id‘: ‘20180001‘, ‘name‘: ‘Jordan‘, ‘age‘: 20, ‘gender‘: ‘male‘ 15 ‘_id‘: ObjectId(‘5d28b293e834575faf929428‘), ‘id‘: ‘20180002‘, ‘name‘: ‘Lee Hua‘, ‘age‘: 20, ‘gender‘: ‘male‘ 16 ‘_id‘: ObjectId(‘5d28b293e834575faf929429‘), ‘id‘: ‘20180003‘, ‘name‘: ‘Mike‘, ‘age‘: 21, ‘gender‘: ‘male‘ 17 18 19 # sort()方法进行排序 20 # pymongo.ASCENDING指定升序 21 # pymongo.DESCENDING指定降序
# sort()方法
-
偏移
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) db = client.test collection = db.students results = collection.find().sort(‘id‘, pymongo.DESCENDING).skip(1) print( [ result[‘id‘] for result in results ] ) # 输出: [‘20180002‘, ‘20180001‘, ‘20180001‘] # skip(1)表示偏移1,即忽略前面一个元素
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) db = client.test collection = db.students results = collection.find().sort(‘id‘, pymongo.DESCENDING).skip(1).limit(2) print( [ result[‘id‘] for result in results ] ) # 输出: [‘20180002‘, ‘20180001‘] # limit(2) 即表示限制输出的数据条数为两条
# 数据量很大时,不使用大的偏移量来查询数据
-
更新
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) db = client[‘test‘] collection = db[‘students‘] # 查询条件:age >= 20 query_condition = ‘age‘: ‘$gte‘: 20 # 更新条件:数据的age加1 update_condition = ‘$inc‘: ‘age‘: 1 result = collection.update_one(query_condition, update_condition) print(result) print(result.matched_count, result.modified_count) # 输出: <pymongo.results.UpdateResult object at 0x110a11c88> 1 1 # 返回的结果是UpdateResul类型的 # 调用matched_count属性,获得匹配数据的条数 # 调用modified_count属性,获得影响数据的条数 # $gte : 大于等于 # $inc : 将字段递增指定的值 # updata_one()更新与筛选器匹配的单个文档
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) db = client[‘test‘] collection = db[‘students‘] query_condition = ‘age‘: ‘$gte‘: 20 update_condition = ‘$inc‘: ‘age‘: 1 result = collection.update_many(query_condition, update_condition) print(result) print(result.matched_count, result.modified_count) # 输出: <pymongo.results.UpdateResult object at 0x111c84448> 4 4
# update_one()方法 和 update_many()方法
-
删除
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) db = client[‘test‘] collection = db[‘students‘] result = collection.delete_one(‘age‘: 21) print(result.deleted_count) # delete_one()方法:删除第一条符合条件的数据 # delete_count属性:获取删除数据的条数
import pymongo client = pymongo.MongoClient(host=‘localhost‘, port=27017) db = client[‘test‘] collection = db[‘students‘] result = collection.delete_many(‘age‘: 21) print(result.deleted_count) # delete_many()方法:删除所有符合条件的数据
-
PyMongo的详细用法:http://api.mongodb.com/python/current/api/pymongo/collection.html
以上是关于数据存储之非关系型数据库存储----MongoDB存储的主要内容,如果未能解决你的问题,请参考以下文章
python爬虫12--文件存储之非关系型数据库存储Redis
Python3网络爬虫实战-33数据存储:非关系型数据库存储:MongoDB