将一串类别拆分为特定的数据框列[重复]
Posted
技术标签:
【中文标题】将一串类别拆分为特定的数据框列[重复]【英文标题】:Split a string of category into specific Dataframe columns [duplicate] 【发布时间】:2020-12-14 10:54:37 【问题描述】:我有一个Dataframe
列,包含以下类别:
data = 'People': ['John','Mary','Andy','April'],
'Class': ['Math, Science','English, Math, Science','Math, Science','Science, English, Math']
df = pd.DataFrame(data, columns = ['People', 'Class'])
如何创建新列并将Dataframe
转换为:
> | People | Math | Science | English |
> -------------------------------------
> | John | Math | Science | |
> | Mary | Math | Science | English |
> | Andy | Math | Science | |
> | April | Math | Science | English |
【问题讨论】:
这能回答你的问题吗? How to split a column into two columns? 【参考方案1】: 使用.get_dummies
获取Class
列的1 和0 表
使用np.where
将1 替换为列名,将0 替换为空字符串。
df.Class.str.get_dummies(', ').apply(lambda x: np.where(x == 1, x.name, ''))
创建一个单独的数据框,我们使用 .join
将其组合回 df
。
.drop
Class
列,不需要。
import pandas as pd
import numpy as np
updated = df.join(df.Class.str.get_dummies(', ').apply(lambda x: np.where(x == 1, x.name, ''))).drop(columns=['Class'])
# display(updated)
People English Math Science
0 John Math Science
1 Mary English Math Science
2 Andy Math Science
3 April English Math Science
【讨论】:
很好用傻瓜:)【参考方案2】:以下代码可能对您有所帮助
columns = set([x for lst in df['Class'] for x in lst.replace(" ", "").split(",") ])
for col in columns:
df[col] = ""*len(df)
for i, val in enumerate(df["Class"]):
cl = val.replace(" ", "").split(",")
print(cl)
for value in cl:
df.loc[i][value] = value
df.drop('Class', axis=1, inplace=True)
输出:
People Science English Math
0 John Science Math
1 Mary Science English Math
2 Andy Science Math
3 April Science English Math
【讨论】:
【参考方案3】:这是一个解决方案,
# Strip-out white spaces before `,\s+`, use dummies to create categorical variable
df = df.set_index('People')
dummies = (
df.Class.str.replace(',\s+', ",", regex=True)
.str.get_dummies(sep=",")
)
English Math Science
0 0 1 1
1 1 1 1
2 0 1 1
3 1 1 1
# Create a "hash map" to substitute categorical data
replace_ = i : j for i, j in enumerate(dummies.columns, 1)
# multiply keys with & replace to fill in the column values.
dummies.mul(list(replace_.keys())).replace(replace_)
English Math Science
People
John 0 Math Science
Mary English Math Science
Andy 0 Math Science
April English Math Science
【讨论】:
以上是关于将一串类别拆分为特定的数据框列[重复]的主要内容,如果未能解决你的问题,请参考以下文章