嵌套字典错误——Python Pandas
Posted
技术标签:
【中文标题】嵌套字典错误——Python Pandas【英文标题】:Nested Dictionary Errors -- Python Pandas 【发布时间】:2018-08-01 13:59:24 【问题描述】:我有以下代码:
import os
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
fileName= input("Enter file name here (Case Sensitve) > ")
df = pd.read_excel(fileName +'.xlsx', sheetname=None, ignore_index=True)
xl = pd.ExcelFile(fileName +'.xlsx')
SystemCount= len(xl.sheet_names)
df1 = pd.DataFrame([])
for y in range(1, int(SystemCount)+ 1):
df = pd.read_excel(xl,'System ' + str(y)) #reads each sheet
df['System 0'.format(y)] = "1" #adds a column for each system, sets the column = 1
df1 = df1.append(df) #appends all sheets together into a new df
df1 = df1.sort_values(['Email']) #sorts by email
df = df1['Email'].value_counts() #counts the amount each email shows
df1['Count'] = df1.groupby('Email')['Email'].transform('count') #adds the count to the end
df1 = df1.apply(lambda x : pd.to_numeric(x,errors='ignore')) #turns ints to floats
d = dict(zip(df1.columns[1:],['sum']*df1.columns[1:].str.contains('System').sum()+['first'])) #adds up each row
df1 = df1.fillna(0).groupby('Email').agg(d) #turns NAN into 0 and groups everything together
df1 = df1.reset_index() #email column was turned into an index with above line, this turns it back to a df column
SystemsList = []#creates empty list
for count in range(1, int(SystemCount)+1): #counts up to the system amount
SystemsList.append(['System 0'.format(count)]) #creates list of systems
SystemDict =
for item in SystemsList:
SystemDict[item]=df1[df1[item]== 1]["Email"]
哪个输出类似于(输出的小sn-p):
Email System 1 System 2 System 3 System 4 Count
test1@test.com 0 1 0 1 2
test2@test.com 1 0 0 1 2
test3@test.com 1 1 0 1 3
test4@test.com 1 0 1 0 2
我正在尝试为每个系统制作一个嵌套字典,将电子邮件放置在它说 1 的任何地方,使用这段代码:
SystemDict =
for item in SystemsList:
SystemDict[item]=df1[df1[item]== 1]["Email"]
但我收到以下错误 - ValueError: Boolean array expected for the condition, not float64.有想法该怎么解决这个吗?
【问题讨论】:
【参考方案1】:这是一种方法。
import pandas as pd
lst = [['test1@test.com', 0, 1, 0, 1, 2],
['test2@test.com', 1, 0, 0, 1, 2],
['test3@test.com', 1, 1, 0, 1, 3],
['test4@test.com', 1, 0, 1, 0, 1]]
df = pd.DataFrame(lst, columns=['Email', 'System 1', 'System 2',
'System 3', 'System 4', 'Count'])
d = 'System'+str(i): list(filter(None, df['System '+str(i)]*df['Email'])) \
for i in range(1, 5)
# 'System1': ['test2@test.com', 'test3@test.com', 'test4@test.com'],
# 'System2': ['test1@test.com', 'test3@test.com'],
# 'System3': ['test4@test.com'],
# 'System4': ['test1@test.com', 'test2@test.com', 'test3@test.com']
【讨论】:
谢谢,这很有帮助。但是我将如何在我的代码中实现这一点?系统的数量取决于用户文件,因此范围可以从 4 到 50。当我尝试修改代码以适应时,我收到此错误:TypeError: can't multiply sequence by non-int of type 'float' 在运行代码之前,您是否无法获取系统的名称? 正确,SystemsList 是我将名称放入列表格式的位置。我是否必须将我的 df1 数据框变成一个列表才能完成这项工作? 所以只需使用这个:d = i: list(filter(None, df[i]*df['Email'])) for i lst
,其中lst
是您的系统列表
现在我收到这个错误,ValueError: array is too big; arr.size * arr.dtype.itemsize
大于最大可能大小。以上是关于嵌套字典错误——Python Pandas的主要内容,如果未能解决你的问题,请参考以下文章
Python - 将字典列表附加到嵌套的默认字典时出现关键错误
构建 MultiIndex pandas DataFrame 嵌套 Python 字典
Python Flatten 用 Pandas 将嵌套字典 JSON 相乘