嵌套字典错误——Python Pandas

Posted

技术标签:

【中文标题】嵌套字典错误——Python Pandas【英文标题】:Nested Dictionary Errors -- Python Pandas 【发布时间】:2018-08-01 13:59:24 【问题描述】:

我有以下代码:

import os
import pandas as pd 
from pandas import ExcelWriter
from pandas import ExcelFile

fileName= input("Enter file name here (Case Sensitve) > ")
df = pd.read_excel(fileName +'.xlsx', sheetname=None, ignore_index=True)
xl = pd.ExcelFile(fileName +'.xlsx')
SystemCount= len(xl.sheet_names)
df1 = pd.DataFrame([])

for y in range(1, int(SystemCount)+ 1): 
    df = pd.read_excel(xl,'System ' + str(y))  #reads each sheet
    df['System 0'.format(y)] = "1"  #adds a column for each system, sets the column = 1
    df1 = df1.append(df)  #appends all sheets together into a new df


df1 = df1.sort_values(['Email']) #sorts by email
df = df1['Email'].value_counts() #counts the amount each email shows
df1['Count'] = df1.groupby('Email')['Email'].transform('count') #adds the count to the end


df1 = df1.apply(lambda x : pd.to_numeric(x,errors='ignore')) #turns ints to floats
d = dict(zip(df1.columns[1:],['sum']*df1.columns[1:].str.contains('System').sum()+['first'])) #adds up each row
df1 = df1.fillna(0).groupby('Email').agg(d) #turns NAN into 0 and groups everything together
df1 = df1.reset_index() #email column was turned into an index with above line, this turns it back to a df column


SystemsList = []#creates empty list
for count in range(1, int(SystemCount)+1): #counts up to the system amount
    SystemsList.append(['System 0'.format(count)]) #creates list of systems

SystemDict = 
for item in SystemsList:
    SystemDict[item]=df1[df1[item]== 1]["Email"]

哪个输出类似于(输出的小sn-p):

 Email          System 1  System 2 System 3 System 4 Count
    test1@test.com    0     1       0        1           2
    test2@test.com    1     0       0        1           2
    test3@test.com    1     1       0        1           3
    test4@test.com    1     0       1        0           2

我正在尝试为每个系统制作一个嵌套字典,将电子邮件放置在它说 1 的任何地方,使用这段代码:

SystemDict = 
    for item in SystemsList:
        SystemDict[item]=df1[df1[item]== 1]["Email"]

但我收到以下错误 - ValueError: Boolean array expected for the condition, not float64.有想法该怎么解决这个吗?

【问题讨论】:

【参考方案1】:

这是一种方法。

import pandas as pd

lst = [['test1@test.com', 0, 1, 0, 1, 2],
       ['test2@test.com', 1, 0, 0, 1, 2],
       ['test3@test.com', 1, 1, 0, 1, 3],
       ['test4@test.com', 1, 0, 1, 0, 1]]

df = pd.DataFrame(lst, columns=['Email', 'System 1', 'System 2',
                                'System 3', 'System 4', 'Count'])

d = 'System'+str(i): list(filter(None, df['System '+str(i)]*df['Email'])) \
                      for i in range(1, 5)

# 'System1': ['test2@test.com', 'test3@test.com', 'test4@test.com'],
#  'System2': ['test1@test.com', 'test3@test.com'],
#  'System3': ['test4@test.com'],
#  'System4': ['test1@test.com', 'test2@test.com', 'test3@test.com']

【讨论】:

谢谢,这很有帮助。但是我将如何在我的代码中实现这一点?系统的数量取决于用户文件,因此范围可以从 4 到 50。当我尝试修改代码以适应时,我收到此错误:TypeError: can't multiply sequence by non-int of type 'float' 在运行代码之前,您是否无法获取系统的名称? 正确,SystemsList 是我将名称放入列表格式的位置。我是否必须将我的 df1 数据框变成一个列表才能完成这项工作? 所以只需使用这个:d = i: list(filter(None, df[i]*df['Email'])) for i lst,其中lst 是您的系统列表 现在我收到这个错误,ValueError: array is too big; arr.size * arr.dtype.itemsize 大于最大可能大小。

以上是关于嵌套字典错误——Python Pandas的主要内容,如果未能解决你的问题,请参考以下文章

Python - 将字典列表附加到嵌套的默认字典时出现关键错误

构建 MultiIndex pandas DataFrame 嵌套 Python 字典

Python Flatten 用 Pandas 将嵌套字典 JSON 相乘

Python Pandas:将嵌套字典转换为数据框

如何从 Python Pandas Dataframe 中的 STRING 列中提取嵌套字典?

用 Pandas 数据框中的行填充嵌套字典