熊猫数据框中数据缩放期间的错误

Posted

技术标签:

【中文标题】熊猫数据框中数据缩放期间的错误【英文标题】:Error during data scaling in pandas data fram 【发布时间】:2021-07-06 09:05:29 【问题描述】:

我有一个 CSV 格式的数据集。我正在尝试在我的数据集中执行缩放,但出现错误。据我了解,我需要从 3D 转换为 2D。但我不确定,该怎么做。

我的数据集示例:

63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0
67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,2
67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1
37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0
41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0
56.0,1.0,2.0,120.0,236.0,0.0,0.0,178.0,0.0,0.8,1.0,0.0,3.0,0
62.0,0.0,4.0,140.0,268.0,0.0,2.0,160.0,0.0,3.6,3.0,2.0,3.0,3
57.0,0.0,4.0,120.0,354.0,0.0,0.0,163.0,1.0,0.6,1.0,0.0,3.0,0
63.0,1.0,4.0,130.0,254.0,0.0,2.0,147.0,0.0,1.4,2.0,1.0,7.0,2
53.0,1.0,4.0,140.0,203.0,1.0,2.0,155.0,1.0,3.1,3.0,0.0,7.0,1
57.0,1.0,4.0,140.0,192.0,0.0,0.0,148.0,0.0,0.4,2.0,0.0,6.0,0
56.0,0.0,2.0,140.0,294.0,0.0,2.0,153.0,0.0,1.3,2.0,0.0,3.0,0
56.0,1.0,3.0,130.0,256.0,1.0,2.0,142.0,1.0,0.6,2.0,1.0,6.0,2
44.0,1.0,2.0,120.0,263.0,0.0,0.0,173.0,0.0,0.0,1.0,0.0,7.0,0
52.0,1.0,3.0,172.0,199.0,1.0,0.0,162.0,0.0,0.5,1.0,0.0,7.0,0
57.0,1.0,3.0,150.0,168.0,0.0,0.0,174.0,0.0,1.6,1.0,0.0,3.0,0
48.0,1.0,2.0,110.0,229.0,0.0,0.0,168.0,0.0,1.0,3.0,0.0,7.0,1
54.0,1.0,4.0,140.0,239.0,0.0,0.0,160.0,0.0,1.2,1.0,0.0,3.0,0

我的代码:

import pandas as pd    
     from sklearn.preprocessing import StandardScaler

     df = pd.read_csv('processed_cleveland_data.csv')
     ss = StandardScaler()
     df_scaled = pd.DataFrame(ss.fit_transform(df),columns = df.columns)

错误:

ValueError                            

        Traceback (most recent call last)
    <ipython-input-5-6db223ceefcd> in <module>
          4 df = pd.read_csv('processed_cleveland_data.csv')
          5 ss = StandardScaler()
    ----> 6 df_scaled = pd.DataFrame(ss.fit_transform(df),columns = df.columns)
    
    ~\Miniconda3\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
        697         if y is None:
        698             # fit method of arity 1 (unsupervised transformation)
    --> 699             return self.fit(X, **fit_params).transform(X)
        700         else:
        701             # fit method of arity 2 (supervised transformation)
    
    ~\Miniconda3\lib\site-packages\sklearn\preprocessing\_data.py in fit(self, X, y, sample_weight)
        728         # Reset internal state before fitting
        729         self._reset()
    --> 730         return self.partial_fit(X, y, sample_weight)
        731 
        732     def partial_fit(self, X, y=None, sample_weight=None):
    
    ~\Miniconda3\lib\site-packages\sklearn\preprocessing\_data.py in partial_fit(self, X, y, sample_weight)
        764         """
        765         first_call = not hasattr(self, "n_samples_seen_")
    --> 766         X = self._validate_data(X, accept_sparse=('csr', 'csc'),
        767                                 estimator=self, dtype=FLOAT_DTYPES,
        768                                 force_all_finite='allow-nan', reset=first_call)
    
    ~\Miniconda3\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
        419             out = X
        420         elif isinstance(y, str) and y == 'no_validation':
    --> 421             X = check_array(X, **check_params)
        422             out = X
        423         else:
    
    ~\Miniconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
         61             extra_args = len(args) - len(all_args)
         62             if extra_args <= 0:
    ---> 63                 return f(*args, **kwargs)
         64 
         65             # extra_args > 0
    
    ~\Miniconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
        614                     array = array.astype(dtype, casting="unsafe", copy=False)
        615                 else:
    --> 616                     array = np.asarray(array, order=order, dtype=dtype)
        617             except ComplexWarning as complex_warning:
        618                 raise ValueError("Complex data not supported\n"
    
    ~\Miniconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order)
         81 
         82     """
    ---> 83     return array(a, dtype, copy=False, order=order)
         84 
         85 
    
    ~\Miniconda3\lib\site-packages\pandas\core\generic.py in __array__(self, dtype)
       1897 
       1898     def __array__(self, dtype=None) -> np.ndarray:
    -> 1899         return np.asarray(self._values, dtype=dtype)
       1900 
       1901     def __array_wrap__(
    
 

       ~\Miniconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order)
             81 
             82     """
        ---> 83     return array(a, dtype, copy=False, order=order)
             84 
             85 
        
        ValueError: could not convert string to float: '?'

【问题讨论】:

错误是有一个字符串'?'在您的数据中。你搜索过这个值吗? 【参考方案1】:

使用na_values? 转换为缺失值:

df = pd.read_csv('processed_cleveland_data.csv', na_values='?')
#if csv has no header
#df = pd.read_csv('processed_cleveland_data.csv', na_values='?', header=None)

from sklearn.preprocessing import StandardScaler

ss = StandardScaler()
df_scaled = pd.DataFrame(ss.fit_transform(df),columns = df.columns)

【讨论】:

以上是关于熊猫数据框中数据缩放期间的错误的主要内容,如果未能解决你的问题,请参考以下文章

ValueError:熊猫数据框中的项目数量错误

`错误:不平衡括号`同时检查项目是不是存在于熊猫数据框中

如何选择数据框中列的前 3 个值 - 熊猫

检查字符串是不是在熊猫数据框中

带有熊猫数据框输入的 sklearn 分类报告产生:“TypeError:并非所有参数都在字符串格式化期间转换”

在熊猫数据框中按行应用时如何保留数据类型?