如何将 DataFrame 中的每一行/单元格值转换为 pandas 中的字典列表?

Posted

技术标签:

【中文标题】如何将 DataFrame 中的每一行/单元格值转换为 pandas 中的字典列表?【英文标题】:How to convert each row/cell values from a DataFrame to a list of dictionaries in pandas? 【发布时间】:2022-01-19 02:04:14 【问题描述】:

我在下面有一个熊猫数据框:

df_input = pd.DataFrame(
         'Domain':['www.google.com','www.apple.com','www.amazon.com'],
         'Description':['The company’s product portfolio includes Googl...','Apple is a multinational corporation that desi...','Amazon is an international e-commerce website ...'],
         'About Us':['Google is a multinational corporation that spe...','Apple is a multinational corporation that desi...','Amazon is an e-commerce website for consumers,...'],
         'Founded':[1998, 1976, 1994],
         'Country':['United States','United States','United States'])

我想按如下方式传输:

["properties": ["name": "Description","value": "The company’s product portfolio includes Google Search, which provides users with access to information online; Knowledge Graph that allows users to search for things, people, or places as well as builds systems recognizing speech and understanding",
            "name": "Domain","value": "www.google.com",
            "name": "About Us", "value": "Google is a multinational corporation that specializes in Internet-related services and products.",
            "name": "Founded", "value": 1998,
            "name": "Country", "value":"United States"],

"properties": ["name": "Description","value": "Apple is a multinational corporation that designs, manufactures, and markets mobile communication and media devices, personal computers, portable digital music players, and sells a variety of related software, services, peripherals, networking solutions, and third-party digital content and applications.",
            "name": "Domain","value": "www.apple.com",
            "name": "About Us", "value": "Apple is a multinational corporation that designs, manufactures, and markets consumer electronics, personal computers, and software.",
            "name": "Founded", "value": 1976,
            "name": "Country", "value":"United States"],

"properties": ["name": "Description","value": "Amazon is an international e-commerce website for consumers, sellers, and content creators. It offers users merchandise and content purchased for resale from vendors and those offered by third-party sellers.",
            "name": "Domain","value": "www.amazon.com",
            "name": "About Us", "value": "Amazon is an e-commerce website for consumers, sellers, and content creators.",
            "name": "Founded", "value": 1994,
            "name": "Country", "value":"United States"]]

如何编写循环来执行此操作?

【问题讨论】:

【参考方案1】:

您只需遍历行。

import pandas as pd

df_input = pd.DataFrame(
         'Domain':['www.google.com','www.apple.com','www.amazon.com'],
         'Description':['The companys product portfolio includes Googl...','Apple is a multinational corporation that desi...','Amazon is an international e-commerce website ...'],
         'About Us':['Google is a multinational corporation that spe...','Apple is a multinational corporation that desi...','Amazon is an e-commerce website for consumers,...'],
         'Founded':[1998, 1976, 1994],
         'Country':['United States','United States','United States'])

data = []
for row in df_input.iterrows():
    props = []
    for key,val in zip(df_input.columns, row[1].values ):
        props.append( 'name':key, 'value':val )
    data.append( 'properties': props )
from pprint import pprint
pprint(data)

输出:

['properties': ['name': 'Domain', 'value': 'www.google.com',
                 'name': 'Description',
                  'value': 'The companys product portfolio includes Googl...',
                 'name': 'About Us',
                  'value': 'Google is a multinational corporation that spe...',
                 'name': 'Founded', 'value': 1998,
                 'name': 'Country', 'value': 'United States'],
 'properties': ['name': 'Domain', 'value': 'www.apple.com',
                 'name': 'Description',
                  'value': 'Apple is a multinational corporation that desi...',
                 'name': 'About Us',
                  'value': 'Apple is a multinational corporation that desi...',
                 'name': 'Founded', 'value': 1976,
                 'name': 'Country', 'value': 'United States'],
 'properties': ['name': 'Domain', 'value': 'www.amazon.com',
                 'name': 'Description',
                  'value': 'Amazon is an international e-commerce website ...',
                 'name': 'About Us',
                  'value': 'Amazon is an e-commerce website for consumers,...',
                 'name': 'Founded', 'value': 1994,
                 'name': 'Country', 'value': 'United States']]

【讨论】:

谢谢蒂姆!这正是我所希望的。很高兴知道 iterrows() 可以解决问题。 我确实需要给你一个强制性警告,即遍历 pandas 中的行或列很慢。如果可能,最好找到批量操作。在这种情况下,我认为没有其他选择。

以上是关于如何将 DataFrame 中的每一行/单元格值转换为 pandas 中的字典列表?的主要内容,如果未能解决你的问题,请参考以下文章

根据 Dataframe 中的单元格值用多个箭头注释绘图烛台图

如何使用 Pandas 在 Python 中基于同一行中的另一个单元格设置单元格值

在 pyspark 的 StructStreaming 中;如何将 DataFrame 中的每一行(json 格式的字符串)转换为多列

Pandas Dataframe - 根据正则表达式条件替换所有单元格值

如何根据Angular 6同一行中的其他单元格值在AG-Grid选择下拉列表中加载不同的选项?

如何将 Dojo Data 网格单元格值重定向到另一个页面?