pandas：读取 xlsx 文件以 dict 以 column1 作为键，column2 作为值

Posted 2023-02-24

技术标签:

【中文标题】pandas：读取 xlsx 文件以 dict 以 column1 作为键，column2 作为值【英文标题】：pandas :Read xlsx file to dict with column1 as key and column2 as values 【发布时间】：2017-09-18 18:57:14 【问题描述】：

我是熊猫新手。我需要读取xlsx 文件并使用pandas 将第一列转换为dict 的键，将第二列转换为dict 的值。我还需要跳过/排除作为标题的第一行。

答案here 是针对pymysql 而here 是针对csv。我需要用户pandas。

这是一个示例 excel 数据

dict_key    dict_value  
key1        str_value1  
key2        str_value2  
key3         None  
key4         int_value3

到目前为止，我的代码如下。

import pandas as pd

excel_file = "file.xlsx"
xls = pd.ExcelFile(excel_file)
df = xls.parse(xls.sheet_names[0], skiprows=1, index_col=None, na_values=['None'])
data_dict = df.to_dict()

但是，它给了我 dict，其中键是列号，值既是 column1 数据又是 column2 数据。

>>> data_dict
u'Chg_Parms': 0: u'  key1 ', 1: u'   key2 ', 2: u'   key3 ', 3: u'   key4 ', 4: u'   str_value1 ', 
                5: u'   str_value2 ', 6: u'   Nan ', 6: u'   int_value3 '

我想要的是第 1 列数据作为键，第 2 列数据作为值，并且将 NaN 替换为 None

data_dict = 'key1': 'str_value1', 'key2': 'str_value2', 'key3': None, 'key4': int_value3

感谢您的帮助。

【问题讨论】：

【参考方案1】：

您可以使用pandasread_excel方法更方便地读取excel文件。您可以传递 index_col 参数，您可以在其中定义 xlsx 的哪一列是索引。

question 中解释了如何将 NaN 更改为 None。

给定一个名为 example.xlsx 的 xlsx 文件，它的构建方式与您在上面编写的一样，下面的代码应该会给出您预期的结果：

import pandas as pd

df = pd.read_excel("example.xlsx", index_col=0)
df = df.where(pd.notnull(df), None)

print df.to_dict()["dict_value"]

【讨论】：

df = df.where(pd.notnull(df), None) 不错，+1【参考方案2】：

您可以使用collections.OrderedDict 来保持密钥的顺序。您会注意到 pd.read_excel 默认加载第一张工作表。编辑：然后你说你想对字典中的项目进行编码，并将'None' 评估为None...

import collections as co
import pandas as pd

df = pd.read_excel('file.xlsx')
df = df.where(pd.notnull(df), None)
od = co.OrderedDict((k.strip().encode('utf8'),v.strip().encode('utf8')) 
                    for (k,v) in df.values)

结果：

>>> od
OrderedDict([(u'key1', u'str_value1'), (u'key2', u'str_value2'), (u'key3', u'None'), (u'key4', u'int_value3')])

一般注意事项：您应该在 Python 程序中将字符串保留为 Unicode。

【讨论】：

@bernie 谢谢你的回答。这绝对是我需要的。但是，如何将每个键值转换为非 unicode 表示，去除空白并保持其类型。例如。 str(u' 1') 产生“1”，而 str(u'None') 产生“无”。我需要 int 和 boolean 值。 @Anil_M：不客气。请查看编辑后的答案。我在 encode('utf8') 旁边添加了 .strip() 来处理空白。我相信这回答了我的问题。谢谢。 @Anil_M：随时！祝您编码愉快。

以上是关于pandas：读取 xlsx 文件以 dict 以 column1 作为键，column2 作为值的主要内容，如果未能解决你的问题，请参考以下文章