pandas将非数值型特征转化为数值型(one-hot编码)
Posted loubin
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pandas将非数值型特征转化为数值型(one-hot编码)相关的知识,希望对你有一定的参考价值。
import pandas as pd import numpy as np import matplotlib.pyplot as plt name = np.array([[‘jack‘, ‘ross‘, ‘john‘, ‘blues‘, ‘frank‘, ‘bitch‘, ‘haha‘, ‘asd‘, ‘loubin‘]]) age = np.array([[12, 32, 23, 4,32,45,65,23,65]]) married = np.array([[1, 0, 1, 1, 0, 1, 0, 0, 0]]) gender = np.array([[0, 0, 0, 0, 1, 1, 1, 1, 1]]) matrix = np.concatenate((name, age, married, gender), axis=0) matrix = matrix.T data = pd.DataFrame(data=matrix, columns=[‘name‘, ‘age‘, ‘married‘, ‘gender‘]) print(data) print(pd.get_dummies(data=data[‘name‘], prefix=‘name‘))
运行结果如下,新的表的列名是以被编码的列的值进行命名,可以定义前缀
C:softwareAnacondaenvsmlpython.exe C:/学习/python/科比生涯数据分析/venv/groupy.py name age married gender 0 jack 12 1 0 1 ross 32 0 0 2 john 23 1 0 3 blues 4 1 0 4 frank 32 0 1 5 bitch 45 1 1 6 haha 65 0 1 7 asd 23 0 1 8 loubin 65 0 1 name_asd name_bitch name_blues ... name_john name_loubin name_ross 0 0 0 0 ... 0 0 0 1 0 0 0 ... 0 0 1 2 0 0 0 ... 1 0 0 3 0 0 1 ... 0 0 0 4 0 0 0 ... 0 0 0 5 0 1 0 ... 0 0 0 6 0 0 0 ... 0 0 0 7 1 0 0 ... 0 0 0 8 0 0 0 ... 0 1 0 [9 rows x 9 columns] Process finished with exit code 0
以上是关于pandas将非数值型特征转化为数值型(one-hot编码)的主要内容,如果未能解决你的问题,请参考以下文章