如何仅对数据框中的分类数据进行编码
Posted
技术标签:
【中文标题】如何仅对数据框中的分类数据进行编码【英文标题】:how to encode only categorical data in a dataframe 【发布时间】:2018-10-01 11:21:24 【问题描述】:enter image description here
如何只对数据框中的分类数据进行编码
Income Length of Residence Median House Value Number of Vehicles Percentage Asian Percentage Black Percentage English Speaking Percentage Hispanic Percentage White MakeDescr SeriesDescr Msrp
1 90000 15.0 F 4 1 1 71 6 81 HYUNDAI Sonata-4 Cyl. 19395.0
2 125000 7.0 H 1 11 1 91 1 81 JEEP Grand Cherokee-V6 29135.0
3 90000 8.0 F 1 1 1 71 6 86 JEEP Liberty 20700.0
4 125000 8.0 F 3 1 1 86 6 86 VOLKSWAGEN Passat-V6 28750.0
5 90000 8.0 F 1 1 1 71 6 81 JEEP Wrangler 20210.0
6 110000 7.0 G 5 6 6 71 6 76 HYUNDAI Santa Fe-V6 25645.0
7 110000 7.0 G 3 11 6 71 6 71 HYUNDAI Sonata-4 Cyl. 15999.0
8 125000 8.0 G 1 1 11 81 6 76 HYUNDAI Santa Fe-V6 23645.0
9 125000 9.0 G 1 6 1 91 1 86 CHEVROLET TRUCK Trailblazer EXT 32040.0
10 110000 8.0 E 2 6 46 81 16 26 JEEP Wrangler-V6 18660.0
11 125000 11.0 G 3 6 1 76 1 86 CHEVROLET TRUCK Silverado 2500 HD 31775.0
12 125000 12.0 G 2 11 6 66 1 71 CHEVROLET Cobalt 13675.0
13 125000 13.0 G 2 1 16 95 6 71 HYUNDAI Veracruz-V6 28600.0
15 110000 11.0 F 5 6 41 61 11 41 HYUNDAI Santa Fe 22499.0
16 125000 9.0 F 2 1 6 91 1 81 HYUNDAI Santa Fe 22499.0
17 125000 8.0 G 2 11 11 66 1 66 MITSUBISHI Endeavor-V6 32602.0
18 110000 12.0 E 1 6 46 81 16 26 HYUNDAI Accent-4 Cyl. 10899.0
19 90000 9.0 F 4 1 6 71 6 81 JEEP Grand Cherokee-6 Cyl. 29080.0
21 125000 8.0 G 1 6 1 76 1 86 MITSUBISHI Endeavor-V6 29302.0
22 110000 12.0 F 2 6 26 66 11 51 HYUNDAI Santa Fe 22499.0
23 90000 9.0 F 1 6 6 66 6 76 HYUNDAI Santa Fe-V6 20995.0
24 125000 9.0 H 1 6 1 91 1 81 HYUNDAI Sonata-V6 18799.0
25 90000 14.0 F 2 1 6 71 11 81 HYUNDAI Elantra-4 Cyl. 13299.0
26 125000 9.0 G 3 1 11 81 6 76 JEEP Grand Cherokee-6 Cyl. 29080.0
27 125000 8.0 H 5 6 1 91 1 81 CHEVROLET TRUCK Trailblazer 29395.0
28 110000 12.0 E 4 6 41 61 11 36 HYUNDAI Sonata-4 Cyl. 15999.0
29 110000 10.0 E 1 6 41 61 11 36 HYUNDAI Santa Fe-V6 20995.0
30 125000 10.0 F 2 6 1 71 6 86 CHEVROLET TRUCK Tahoe 37000.0
32 90000 10.0 F 1 1 1 71 6 86 MITSUBISHI Galant-V6 19997.0
33 125000 12.0 F 1 1 1 86 6 86 CHEVROLET TRUCK Trailblazer 28175.0
... ... ... ... ... ... ... ... ... ... ... ... ...
4451 110000 9.0 F 3 6 41 61 11 36 NISSAN Sentra-4 Cyl. 17990.0
4452 125000 11.0 G 2 1 11 81 6 76 CHEVROLET TRUCK Tahoe 39515.0
4453 125000 8.0 H 1 6 1 91 1 81 HYUNDAI Elantra-4 Cyl. 15195.0
4454 110000 10.0 F 3 6 41 61 11 41 HYUNDAI Genesis-4 Cyl. 26750.0
4455 125000 7.0 H 4 11 1 76 1 76 HYUNDAI Sonata-4 Cyl. 19695.0
4456 125000 9.0 G 5 6 1 76 1 86 NISSAN Altima 22500.0
4457 110000 11.0 E 1 6 46 81 16 26 GMC LIGHT DUTY Denali 51935.0
4458 125000 6.0 H 1 11 1 76 1 76 JEEP Liberty-V6 24865.0
4459 125000 12.0 G 3 1 16 95 6 71 HONDA Accord-V6 26700.0
4460 125000 7.0 F 1 1 1 86 6 86 HYUNDAI Veloster-4 Cyl. 17300.0
4461 90000 10.0 F 2 6 11 66 6 71 CADILLAC SRX-V6 42210.0
4463 110000 8.0 F 3 6 26 61 11 56 GMC LIGHT DUTY Acadia 42390.0
4468 125000 8.0 G 1 1 1 91 1 86 HONDA Pilot-V6 40820.0
4469 125000 10.0 H 5 11 1 91 1 81 TOYOTA Highlander-V6 30695.0
4470 110000 12.0 F 1 6 41 61 11 41 HYUNDAI Elantra-4 Cyl. 15195.0
4473 110000 13.0 F 1 6 21 66 6 61 ACURA TSX 32910.0
4476 125000 9.0 G 1 6 1 76 1 86 BMW X3 36750.0
4482 125000 10.0 H 1 6 1 91 1 81 SUBARU Forester-4 Cyl. 21195.0
4486 125000 11.0 H 2 6 1 91 1 81 GMC LIGHT DUTY Yukon XL 44315.0
4492 125000 10.0 H 2 6 1 91 1 81 BMW 5 Series 53400.0
4493 110000 12.0 G 2 6 6 71 6 76 ACURA TL 33725.0
4494 125000 12.0 F 3 1 1 86 6 86 ACURA TL 33725.0
4495 125000 12.0 F 3 1 1 86 6 86 ACURA TL 33725.0
4496 125000 7.0 G 5 1 11 81 6 76 ACURA TL 33325.0
4497 125000 9.0 G 1 6 1 76 1 86 ACURA TL 33725.0
4498 125000 12.0 G 3 1 11 81 6 76 ACURA TL 33725.0
4499 110000 14.0 G 8 11 6 71 6 71 ACURA TL 33725.0
4501 125000 9.0 G 3 11 6 66 1 71 FORD Taurus-V6 20050.0
4502 110000 2.0 G 4 11 6 71 6 71 DODGE Stratus-4 Cyl. 15910.0
4503 125000 8.0 F 1 1 1 86 6 86 DODGE Stratus-4 Cyl. 19145.0
【问题讨论】:
你有尝试过的代码吗 在数据帧中,分类数据在索引 2 中,索引 -2,-3 位置如何编码 你能把问题说得更清楚些吗?我建议查看***.com/help/how-to-ask 我有一个数据框,它由不同的特征组成,因为某些特征在不同的索引位置具有分类数据,我必须只对不同索引位置的分类数据进行编码 【参考方案1】:# Using standard scikit-learn label encoder.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
# Encode all string columns. Assuming all categoricals are of type str.
for c in df.select_dtypes(['object']):
print "Encoding column " + c
df[c] = le.fit_transform(df[c])
【讨论】:
当我尝试以 json 格式打印预测结果时出现此错误我可以得到此 TypeError 的解决方案:ufunc 'add' 不包含签名匹配类型 dtype(' 错误出现在以下代码中 for i in range(len(result_array)): if i > 0: result_string += "," result_string += "[ " + "\"" + result_array [i] + "\"" + "]" # result_string+="\""+result1_array[i]+"\"" result_string += "]" return result_string @AjayKumar 您需要在 Stack Overflow 上查找这些错误或类似问题。如果您没有找到任何答案,请提出一个新问题,提供导致这些错误的代码部分,然后才能有人帮助您。以上是关于如何仅对数据框中的分类数据进行编码的主要内容,如果未能解决你的问题,请参考以下文章
如何从 pyspark 中的数据框中仅选择 70% 的重新编码?