得到假人后重新排列列
Posted
技术标签:
【中文标题】得到假人后重新排列列【英文标题】:rearranging columns after getting dummies 【发布时间】:2018-04-08 14:17:31 【问题描述】: A B C D E
0 165349.20 136897.80 471784.10 New York 192261.83
1 162597.70 151377.59 443898.53 California 191792.06
2 153441.51 101145.55 407934.54 Florida 191050.39
3 144372.41 118671.85 383199.62 New York 182901.99
4 142107.34 91391.77 366168.42 Florida 166187.94
使用后df = pd.get_dummies(df, columns=['D'])
A B C E D_New York D_California D_Florida
0 165349.20 136897.80 471784.10 192261.83 0 0 1
1 162597.70 151377.59 443898.53 191792.06 1 0 0
2 153441.51 101145.55 407934.54 191050.39 0 1 0
3 144372.41 118671.85 383199.62 182901.99 0 0 1
4 142107.34 91391.77 366168.42 166187.94 0 1 0
有没有一种方法可以在不使用 df[['A','B','C','D_Califorina','D_New York','D_Florida','E']] 的情况下使输出看起来像这样?
A B C D_New York D_California D_Florida E
0 165349.20 136897.80 471784.10 0 0 1 192261.83
1 162597.70 151377.59 443898.53 1 0 0 191792.06
2 153441.51 101145.55 407934.54 0 1 0 191050.39
3 144372.41 118671.85 383199.62 0 0 1 182901.99
4 142107.34 91391.77 366168.42 0 1 0 166187.94
【问题讨论】:
看来你需要df.sort_index(axis=1)
Python Pandas - Re-ordering columns in a dataframe based on column name的可能重复
【参考方案1】:
可能未按排序顺序的列的通用解决方案: 找到列的位置以相应地进行虚拟化和连接
j = df.columns.get_loc('D')
left = df.iloc[:, :j]
dumb = pd.get_dummies(df[['D']])
rite = df.iloc[:, j+1:]
pd.concat([left, dumb, rite], axis=1)
A B C D_California D_Florida D_New York E
0 165349.20 136897.80 471784.10 0 0 1 192261.83
1 162597.70 151377.59 443898.53 1 0 0 191792.06
2 153441.51 101145.55 407934.54 0 1 0 191050.39
3 144372.41 118671.85 383199.62 0 0 1 182901.99
4 142107.34 91391.77 366168.42 0 1 0 166187.94
【讨论】:
【参考方案2】:通过使用sort_index
df.sort_index(axis=1)
Out[813]:
A B C D_California D_Florida D_NewYork \
0 165349.20 136897.80 471784.10 0 0 1
1 162597.70 151377.59 443898.53 1 0 0
2 153441.51 101145.55 407934.54 0 1 0
3 144372.41 118671.85 383199.62 0 0 1
4 142107.34 91391.77 366168.42 0 1 0
E
0 192261.83
1 191792.06
2 191050.39
3 182901.99
4 166187.94
编辑:.....用dict
和lambda
列出sort
A=dict(zip(df.columns,list(range(0,df.shape[1]))))
#build a dict A store the order of original df
df1=pd.get_dummies(df, columns=['State'])
#get your df
youroder=list(df1)
#new disorder column name
youroder.sort(key=lambda val: A[val.split(sep='_')[0]])
# sort it
df1[youroder]
Out[842]:
R&D Spend Administration Marketing Spend State_California \
0 165349.20 136897.80 471784.10 0
1 162597.70 151377.59 443898.53 1
2 153441.51 101145.55 407934.54 0
3 144372.41 118671.85 383199.62 0
4 142107.34 91391.77 366168.42 0
State_Florida State_NewYork Profit(E)
0 0 1 192261.83
1 0 0 191792.06
2 1 0 191050.39
3 0 1 182901.99
4 1 0 166187.94
【讨论】:
假设列名不像我的示例中那样按字母顺序排列,还有其他方法吗? 这些是原始列名,分别为:R&D Spend、Administration、Marketing Spend、State、Profit(E)。我想将它们安排到:研发支出、管理、营销支出、State_California、State_New York、State_Florida、Profit(E) @ZaleGoldart 我能想到的就是拆分原始 df,然后将它们连接回来【参考方案3】:不确定是否有更好的方法,但这会起作用
col = ['R&D Spend', 'Administration', 'Marketing Spend', 'State_California', 'State_New York', 'State_Florida', 'Profit(E)']
df=df.loc[:, col]
【讨论】:
以上是关于得到假人后重新排列列的主要内容,如果未能解决你的问题,请参考以下文章