如何将回归数据转换为分类数据？ [关闭]

Posted 2023-03-12

技术标签:

【中文标题】如何将回归数据转换为分类数据？ [关闭]【英文标题】：how can I convert Regression data into Classification data? [closed] 【发布时间】：2021-10-24 22:48:09 【问题描述】：

我有一个带有列的数据

   ['symboling', 'Company', 'fueltype', 'aspiration', 'doornumber',
   'carbody', 'drivewheel', 'enginelocation', 'carlength', 'carwidth',
   'curbweight', 'enginetype', 'cylindernumber', 'enginesize',
   'fuelsystem', 'horsepower', 'price', 'total_mpg']

目标是预测汽车的价格。现在他的价格数据是连续的。我想知道如何转换它以便我可以使用分类模型。

经过搜索，我确实发现我可以通过定义范围来做到这一点，但我无法理解。请帮帮我

【问题讨论】：

【参考方案1】：

假设我们有一个包含 2 个连续列的数据框，分别命名为 x1 和 x2：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

x1 = np.random.rand(100)
x2 = np.random.rand(100)
df = pd.DataFrame("x1":x1,"x2":x2)
df.head()

#        x1       x2
#0  0.049202    0.131046
#1  0.606525    0.756687
#2  0.910932    0.944692
#3  0.904655    0.439637
#4  0.565204    0.418432

# Plot values
sns.scatterplot(x=range(100),y=df["x1"])
sns.scatterplot(x=range(100),y=df["x2"])

然后我们可以像这样制作一些桶：

x1_cat = pd.cut(df['x1'], bins=[0.,0.2,0.4,0.6,0.8,np.inf], labels=[0,1,2,3,4])
x2_cat = pd.cut(df['x2'], bins=[0.,0.2,0.4,0.6,0.8,np.inf], labels=[0,1,2,3,4])
df_cat = pd.concat([x1_cat,x2_cat],axis=1)
df_cat.head()

#   x1  x2
#0  0   0
#1  3   3
#2  4   4
#3  4   2
#4  2   2

# Plot values
sns.scatterplot(x=range(100),y=df_cat["x1"])
sns.scatterplot(x=range(100),y=df_cat["x2"])

【讨论】：

这是指向数据kaggle.com/dronax/car-prices-dataset 的链接所以我必须只将价格转换为范围还是所有值。如果我想预测价格。我有点迷茫您只能转换价格。例如df['price'] = pd.cut(df['price'], bins=[0.,0.2,0.4,0.6,0.8,np.inf], labels=[0,1,2,3,4]) 好的，我想我明白了。谢谢

以上是关于如何将回归数据转换为分类数据？ [关闭]的主要内容，如果未能解决你的问题，请参考以下文章