将 Python Pandas 数据框转换为 JSon 格式并通过使用 Python 添加其列名保存到 MongoDB 数据库中

Posted

技术标签:

【中文标题】将 Python Pandas 数据框转换为 JSon 格式并通过使用 Python 添加其列名保存到 MongoDB 数据库中【英文标题】:Convert Python Pandas Data Frame into JSon format and saved into MongoDB database by adding its column name using Python 【发布时间】:2017-10-24 08:48:10 【问题描述】:

在 Json 中转换 DataFrame,添加列名,如所需的输出技能和建议中所示,然后保存在 MongoDB 集合中

Python Pandas DataFrame 作为输入

    0     1     2       3       4       5       6       7
java    hadoop  java    hdfs    c       c++     php     python   html

c       c       c++     hdfs    python  hadoop  java    php      html

c++     c++     c       python  hdfs    hadoop  java    php      html

hadoop  hadoop  java    hdfs    c       c++     php     python   html

hdfs    hdfs    hadoop  java    c       c++     python  php      html

python  python  c++     html    c       php     hdfs    hadoop   java

将所需输出保存到 MongoDB 集合中

"_id" : ObjectId("5922a781205a763b55e2e90e"), "skill" : "java", "suggestions" : [ "hadoop", "java", "hdfs", "c", "c++", "php ", "python", "html" ]

"_id" : ObjectId("5922a781205a763b55e2e91e"), "skill" : "c", "suggestions" : [ "c", "c++", "hdfs", "python", "hadoop", "java ", "php", "html" ]

"_id" : ObjectId("5922a781205a763b55e2e92e"), "skill" : "c++", "suggestions" : [ "c++", "c", "python", "hdfs", "hadoop", "java ", "php", "html" ]

"_id" : ObjectId("5922a781205a763b55e2e93e"), "skill" : "hadoop", "suggestions" : [ "hadoop", "java", "hdfs", "c", "c++", "php ", "python", "html" ]

【问题讨论】:

【参考方案1】:

首先需要将数据翻译成相应的格式。

strlist = [['java','hadoop','java','hdfs','c','c++','php','python','html'],
      ['c','c','c++','hdfs','python','hadoop','java','php','html'],
      ['c++','c++','c','python','hdfs','hadoop','java','php','html'],
      ['hadoop','hadoop','java','hdfs','c','c++','php','python','html'],
      ['hdfs','hdfs','hadoop','java','c','c++','python','php','html'],
      ['python','python','c++','html','c','php','hdfs','hadoop','java']]

df = pd.DataFrame(strlist)

#I guess you need the following code
df['skill']=df[df.columns[:1]].values
df['suggestions'] = df[df.columns[1:]].values.tolist()
df = df[['skill','suggestions']]

print(df)
    skill                                        suggestions
0    java  [hadoop, java, hdfs, c, c++, php, python, html...
1       c  [c, c++, hdfs, python, hadoop, java, php, html...
2     c++  [c++, c, python, hdfs, hadoop, java, php, html...
3  hadoop  [hadoop, java, hdfs, c, c++, php, python, html...
4    hdfs  [hdfs, hadoop, java, c, c++, python, php, html...
5  python  [python, c++, html, c, php, hdfs, hadoop, java...

然后将dataframe插入到mongdb数据库中。

records = json.loads(df.T.to_json()).values()
collection.insert_many(records)

【讨论】:

以上是关于将 Python Pandas 数据框转换为 JSon 格式并通过使用 Python 添加其列名保存到 MongoDB 数据库中的主要内容,如果未能解决你的问题,请参考以下文章

如何将字节数据转换为 python pandas 数据框?

如何将 pandas 数据框列转换为本机 python 数据类型?

如何将python中的类转换为pandas数据框?

Python Pandas:将嵌套字典转换为数据框

将 JSON 时间戳字符串转换为 pandas 数据框中的 python 日期

如何在python中使用pandas将字典列表转换为数据框[重复]