如何根据条件过滤后的熊猫数据框来导出列
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何根据条件过滤后的熊猫数据框来导出列相关的知识,希望对你有一定的参考价值。
import easygui as gui
import pandas as pd
filename = gui.fileopenbox(msg='Please choose the Excel workbook containing the bank data.') #select workbook containing FC and WF data
colnames=['1','2','3','4','5','6','7','8','9','10','11','12'] #define col names because variable number of col won't read unless max col# is defined
dfdata = pd.read_csv(filename,names=colnames) #set dataframe equal to csv file
key = dfdata["12"].isnull() #set criteria for splitting data equal to null value in column 12
dftopdata = dfdata.loc[key] #set new df equal to key criteria
dfbottomdata = dfdata.loc[~key] #set new df NOT equal to key criteria
dftopdata = dftopdata.dropna(axis=1, how='all', thresh=None, subset=None) #drop any column with all values = NaN
dftopdata = dftopdata.dropna(axis=0, how='all', thresh=None, subset=None) #drop any row with all values = NaN
header = dftopdata.iloc[1] #Creates a header variable at row index location 1
dftopdata = dftopdata[2:] #Resets dataframe equal to row 2 and beyond
dftopdata.rename(columns = header, inplace = True) #sets names of columns in the dataframe equal to header
header = dfbottomdata.iloc[0] #Creates a header variable at row index location 0
dfbottomdata = dfbottomdata[1:] #Resets dataframe equal to row 1 and beyond
dfbottomdata.rename(columns = header, inplace = True) #sets names of columns in the dataframe equal to header
上面的代码产生两个数据帧。
这是来自数据帧的数据样本,称为顶部数据:
Routing Currency Account Number Account Name Opening Ledger Credits Amt Credits Num Debits Amt Debits Num Closing Ledger
123456789 USD 1111111112 A 717.57 100.00 1 100.72 3 716.85
123456789 USD 1111111113 B 1,350.30 NaN 0 28.53 1 1,321.77
123456789 USD 1111111114 C 26,570.34 320.52 1 42.17 1 26,848.69
123456789 USD 1111111115 D 1,031.95 2,000.00 1 703.95 2 2,328.00
123456789 USD 1111111116 E 1,000.00 600.00 2 72.03 2 1,527.97
这是来自数据帧的数据样本,称为底部数据:
Date Routing Currency Account Number Account Name BAI Type BAI Code CR Amount DB Amount Serial Num Ref Num Description
12/10/2019 123456789 USD 1111111112 A Miscellaneous Fees 7 NaN 28.69 NaN 69650977 MTHLY ANALYSIS CHARGE
12/20/2019 123456789 USD 1111111112 A Misc Credit 1 100 NaN NaN 70069250 XFR TO DDA FR DDA 001111085716122019RF#1452300...
12/24/2019 123456789 USD 1111111112 A Misc Debit 4 NaN 69.08 NaN 70184768 ACCESSIBLEINSURA WEBPAYMENTPCOF PROPERTIES SERIES
12/24/2019 123456789 USD 1111111112 A Misc Debit 5 NaN 2.95 NaN 70184769 SEP INSURANC ACH WEBPAYMENTPCOF PROPERTIES SERIES
12/10/2019 123456789 USD 1111111113 B Miscellaneous Fees 6 NaN 28.53 NaN 69645166 MTHLY ANALYSIS CHARGE
我想在底部数据df中添加一个名为“余额”的新列,该列包含每个银行帐户的余额。
底部数据df中给定银行帐户的最早交易日期的余额应等于该银行帐户在第一个数据框中的期初分类账值加上底部数据的该行中的任何贷项或减去任何借项df。
给定银行帐户的每笔后续交易都应等于前一笔交易日期以来的余额加上底数df这一行中的任何贷方或减去任何借方。
这是我希望底部数据df经过分析后的样子:
Date Routing Currency Account Number Account Name BAI Type BAI Code CR Amount DB Amount Serial Num Ref Num Description Balance
12/10/2019 123456789 USD 1111111112 A Miscellaneous Fees 7 NaN 28.69 NaN 69650977 MTHLY ANALYSIS CHARGE 688.88
12/20/2019 123456789 USD 1111111112 A Misc Credit 1 100 NaN NaN 70069250 XFR TO DDA FR DDA 001111085716122019RF#1452300... 788.88
12/24/2019 123456789 USD 1111111112 A Misc Debit 4 NaN 69.08 NaN 70184768 ACCESSIBLEINSURA WEBPAYMENTPCOF PROPERTIES SERIES 719.80
12/24/2019 123456789 USD 1111111112 A Misc Debit 5 NaN 2.95 NaN 70184769 SEP INSURANC ACH WEBPAYMENTPCOF PROPERTIES SERIES 716.85
12/10/2019 123456789 USD 1111111113 B Miscellaneous Fees 6 NaN 28.53 NaN 69645166 MTHLY ANALYSIS CHARGE 1321.77
但是我对下一步的工作感到困惑。
我曾考虑过为每个银行帐户创建一个数据框,但这似乎效率很低。
有人能指出我正确的方向吗?
答案
假设dfbottomdata
,Date
和Routing
的Account Number
值升序排列(从最小到最大),则下面的代码应该起作用:
#Add Closing Ledger value from dftopdata
dfbottomdata = dfbottomdata.merge(dftopdata[['Routing','Account Number','Opening Ledger']], on=['Routing','Account Number'])
dfbottomdata.rename(columns='Opening Ledger': 'Balance', inplace=True)
#Replace NaN with 0 for calculations
dfbottomdata['CR Amount'].fillna(0, inplace=True)
dfbottomdata['DB Amount'].fillna(0, inplace=True)
#Handle use case for first row
dfbottomdata.loc[0, 'Balance'] = dfbottomdata.loc[0, 'Balance'] + dfbottomdata.loc[0, 'CR Amount'] - dfbottomdata.loc[0, 'DB Amount']
#Iterate through each row, applying logic only if previous row Routing/AccountNumber match
for i in range(1, len(dfbottomdata)):
if (dfbottomdata.loc[i-1, 'Routing'] == dfbottomdata.loc[i, 'Routing']) & (dfbottomdata.loc[i-1, 'Account Number'] == dfbottomdata.loc[i, 'Account Number']):
dfbottomdata.loc[i, 'Balance'] = dfbottomdata.loc[i-1, 'Balance'] + dfbottomdata.loc[i, 'CR Amount'] - dfbottomdata.loc[i, 'DB Amount']
else:
dfbottomdata.loc[i, 'Balance'] = dfbottomdata.loc[i, 'Balance'] + dfbottomdata.loc[i, 'CR Amount'] - dfbottomdata.loc[i, 'DB Amount']
以上是关于如何根据条件过滤后的熊猫数据框来导出列的主要内容,如果未能解决你的问题,请参考以下文章