pandas将非数值型特征转化为数值型(one-hot编码)

Posted loubin

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pandas将非数值型特征转化为数值型(one-hot编码)相关的知识,希望对你有一定的参考价值。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

name = np.array([[jack, ross, john, blues, frank, bitch, haha, asd, loubin]])
age = np.array([[12, 32, 23, 4,32,45,65,23,65]])
married = np.array([[1, 0, 1, 1, 0, 1, 0, 0, 0]])
gender = np.array([[0, 0, 0, 0, 1, 1, 1, 1, 1]])


matrix = np.concatenate((name, age, married, gender), axis=0)
matrix = matrix.T


data = pd.DataFrame(data=matrix, columns=[name, age, married, gender])
print(data)

print(pd.get_dummies(data=data[name], prefix=name))

运行结果如下,新的表的列名是以被编码的列的值进行命名,可以定义前缀

C:softwareAnacondaenvsmlpython.exe C:/学习/python/科比生涯数据分析/venv/groupy.py
     name age married gender
0    jack  12       1      0
1    ross  32       0      0
2    john  23       1      0
3   blues   4       1      0
4   frank  32       0      1
5   bitch  45       1      1
6    haha  65       0      1
7     asd  23       0      1
8  loubin  65       0      1
   name_asd  name_bitch  name_blues  ...  name_john  name_loubin  name_ross
0         0           0           0  ...          0            0          0
1         0           0           0  ...          0            0          1
2         0           0           0  ...          1            0          0
3         0           0           1  ...          0            0          0
4         0           0           0  ...          0            0          0
5         0           1           0  ...          0            0          0
6         0           0           0  ...          0            0          0
7         1           0           0  ...          0            0          0
8         0           0           0  ...          0            1          0

[9 rows x 9 columns]

Process finished with exit code 0

 

 

以上是关于pandas将非数值型特征转化为数值型(one-hot编码)的主要内容,如果未能解决你的问题,请参考以下文章

将PB中的枚举型转化数值型

R语言将字符型(Character)变量转化为数值型(Numeric)

R语言数值型转化成字符串

多元线性回归

Pipeline

javascript中如何将获得的整型数值转换为字节数组