如何使用python将csv数据推送到mongodb
Posted
技术标签:
【中文标题】如何使用python将csv数据推送到mongodb【英文标题】:how to push a csv data to mongodb using python 【发布时间】:2015-02-09 13:11:57 【问题描述】:尝试使用 python 将 csv 数据推送到 mongodb。我是 python 和 mongodb 的初学者。我使用以下代码
import csv
import json
import pandas as pd
import sys, getopt, pprint
from pymongo import MongoClient
#CSV to JSON Conversion
csvfile = open('C://test//final-current.csv', 'r')
jsonfile = open('C://test//6.json', 'a')
reader = csv.DictReader( csvfile )
header= [ "S.No", "Instrument Name", "Buy Price", "Buy Quantity", "Sell Price", "Sell Quantity", "Last Traded Price", "Total Traded Quantity", "Average Traded Price", "Open Price", "High Price", "Low Price", "Close Price", "V" ,"Time"]
#fieldnames=header
output=[]
for each in reader:
row=
for field in header:
row[field]=each[field]
output.append(row)
json.dump(output, jsonfile, indent=None, sort_keys=False , encoding="UTF-8")
mongo_client=MongoClient()
db=mongo_client.october_mug_talk
db.segment.drop()
data=pd.read_csv('C://test//6.json', error_bad_lines=0)
df = pd.DataFrame(data)
records = csv.DictReader(df)
db.segment.insert(records)
但输出是以这种格式给出的
/* 0 */
"_id" : ObjectId("54891c4ffb2a0303b0d43134"),
"[\"AverageTradedPrice\":\"0\"" : "BuyPrice:\"349.75\""
/* 1 */
"_id" : ObjectId("54891c4ffb2a0303b0d43135"),
"[\"AverageTradedPrice\":\"0\"" : "BuyQuantity:\"3000\""
/* 2 */
"_id" : ObjectId("54891c4ffb2a0303b0d43136"),
"[\"AverageTradedPrice\":\"0\"" : "ClosePrice:\"350\""
/* 3 */
"_id" : ObjectId("54891c4ffb2a0303b0d43137"),
"[\"AverageTradedPrice\":\"0\"" : "HighPrice:\"0\""
实际上我希望输出喜欢单个 id 所有其他字段应显示为子类型 例如:
_id" : ObjectId("54891c4ffb2a0303b0d43137")
AveragetradedPrice :0
HighPrice:0
ClosePrice:350
buyprice:350.75
请帮帮我。在此先感谢
【问题讨论】:
output.append(row) => db.segment.insert(row) 但是如果我直接推送到 mongodb ,它会产生 InvalidDocument: key 'S.No' must not contain '.' 将标头作为 dict 以将 s.no 映射为 s_no,因此它可以作为 json 键接受 不使用mongoimport
有什么特殊原因吗?
我终于搞定了。谢谢
【参考方案1】:
感谢您的建议。这是更正后的代码:
import csv
import json
import pandas as pd
import sys, getopt, pprint
from pymongo import MongoClient
#CSV to JSON Conversion
csvfile = open('C://test//final-current.csv', 'r')
reader = csv.DictReader( csvfile )
mongo_client=MongoClient()
db=mongo_client.october_mug_talk
db.segment.drop()
header= [ "S No", "Instrument Name", "Buy Price", "Buy Quantity", "Sell Price", "Sell Quantity", "Last Traded Price", "Total Traded Quantity", "Average Traded Price", "Open Price", "High Price", "Low Price", "Close Price", "V" ,"Time"]
for each in reader:
row=
for field in header:
row[field]=each[field]
db.segment.insert(row)
【讨论】:
【参考方案2】:假设您的 CSV 中有一个标题行,那么有一个更好的方法可以减少导入次数。
from pymongo import MongoClient
import csv
# DB connectivity
client = MongoClient('localhost', 27017)
db = client.db
collection = db.collection
# Function to parse csv to dictionary
def csv_to_dict():
reader = csv.DictReader(open(FILEPATH))
result =
for row in reader:
key = row.pop('First_value')
result[key] = row
return query
# Final insert statement
db.collection.insert_one(csv_to_dict())
希望有帮助
【讨论】:
【参考方案3】:最简单的方法是使用 pandas 我的代码是
import json
import pymongo
import pandas as pd
myclient = pymongo.MongoClient()
df = pd.read_csv('yourcsv.csv',encoding = 'ISO-8859-1') # loading csv file
df.to_json('yourjson.json') # saving to json file
jdf = open('yourjson.json').read() # loading the json file
data = json.loads(jdf) # reading json file
现在你可以在你的 mangodb 数据库中插入这个 json :-]
【讨论】:
er,显示您离开的部分就是问题所在。【参考方案4】:为什么要一一插入数据?看看这个。
import pandas as pd
from pymongo import MongoClient
client = MongoClient(<your_credentials>)
database = client['YOUR_DB_NAME']
collection = database['your_collection']
def csv_to_json(filename, header=None):
data = pd.read_csv(filename, header=header)
return data.to_dict('records')
collection.insert_many(csv_to_json('your_file_path'))
请注意,如果文件太大,您的应用程序可能会崩溃。
【讨论】:
【参考方案5】: from pymongo import MongoClient
import csv
import json
# DB connectivity
client = MongoClient('localhost', 27017)
db = client["database name"]
col = db["collection"]
# Function to parse csv to dictionary
def csv_to_dict():
reader = csv.DictReader(open('File with path','r'))
result =
for row in reader:
key = row.pop('id')
result[key]= row
return result
# Final insert statement
x=col.insert_one(csv_to_dict())
print(x.inserted_id)
# to insert one row
#and to insert many rows following code is to be executed
from pymongo import MongoClient
import csv
# read csv file as a list of lists
client = MongoClient('localhost', 27017)
db = client["data base name"]
col = db["Collection Name"]
with open('File with path', 'r') as read_obj:
# pass the file object to reader() to get the reader object
csv_reader = csv.DictReader(read_obj)
# Pass reader object to list() to get a list of lists
mylist = list(csv_reader)
#print(list_of_rows)
x = col.insert_many(mylist)
#print list of the _id values of the inserted documents:
print(x.inserted_ids)
【讨论】:
以上是关于如何使用python将csv数据推送到mongodb的主要内容,如果未能解决你的问题,请参考以下文章
如何使用Pentaho勺子将数据从MySQL数据库推送到Facebook网页
如何使用 Python 将数据从 MySQL 推送到 HTML [关闭]
如何使用 pandas 读取并推送到 SQL 数据库中的文件不断获取数据
Playwright & NodeJs - 读取 CSV 并将数据推送到数组