KeyError：“未在索引中找到”-创建的虚拟变量未在索引中找到

Posted 2023-03-12

技术标签:

【中文标题】KeyError：“未在索引中找到”-创建的虚拟变量未在索引中找到【英文标题】：KeyError: 'not found in index' -created dummy variables not found in index 【发布时间】：2021-01-29 11:45:21 【问题描述】：

我正在研究逻辑回归问题。我根据另一列中的信息创建了 2 个新变量（如果值高于或低于阈值，则给出值 'gt40'）。然后我必须为“糖尿病”、“bmi_cat”和“白蛋白猫”变量创建虚拟变量。一切正常，我可以打印一个头/导出一个符合我需要的 csv。

然后，当我尝试实际运行逻辑回归模型时，出现以下错误：

X = new_df[['Age','Sex','SMOKE', 'DIABETES_NO', 'DIABETES_INSULIN', 'DIABETES_NON-INSULIN', 'bmi_cat_gte40',
           'bmi_catlt40','bmi_cat0','albumin_catgt3.5', 'albumin_catlt3.5','ablumin_cat0']]

y = new_df['Mortality']
#Create train, test, split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

#Logistic Regression Model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

然后我得到以下错误：

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-22-88ad326a3d1e> in <module>
      1 #Create X and y
      2 X = new_df[['Age','Sex','SMOKE', 'DIABETES_NO', 'DIABETES_INSULIN', 'DIABETES_NON-INSULIN', 'bmi_cat_gte40',
----> 3            'bmi_catlt40','bmi_cat0','albumin_catgt3.5', 'albumin_catlt3.5','ablumin_cat0']]
      4 
      5 y = new_df['Mortality']

E:\Users\davidwool\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2680         if isinstance(key, (Series, np.ndarray, Index, list)):
   2681             # either boolean or fancy integer index
-> 2682             return self._getitem_array(key)
   2683         elif isinstance(key, DataFrame):
   2684             return self._getitem_frame(key)

E:\Users\davidwool\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_array(self, key)
   2724             return self._take(indexer, axis=0)
   2725         else:
-> 2726             indexer = self.loc._convert_to_indexer(key, axis=1)
   2727             return self._take(indexer, axis=1)
   2728 

E:\Users\davidwool\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
   1325                 if mask.any():
   1326                     raise KeyError('mask not in index'
-> 1327                                    .format(mask=objarr[mask]))
   1328 
   1329                 return com._values_from_object(indexer)

KeyError: "['bmi_catlt40' 'bmi_cat0' 'albumin_catgt3.5' 'albumin_catlt3.5'\n 'ablumin_cat0'] not in index"

csv 看起来不错并且包含所有值，但是当我尝试运行模型时，它找不到我从索引中的虚拟变量中创建的那些变量。

任何建议将不胜感激。谢谢！

#columns
new_df.columns

Index(['Age', 'Sex', 'DIABETES', 'bmi', 'SMOKE', 'DPRALBUM', 'Readmission',
       'Infection', 'bmi_cat', 'albumin_cat', 'Optimized', 'Mortality',
       'DIABETES_INSULIN', 'DIABETES_NO', 'DIABETES_NON-INSULIN', 'bmi_cat_0',
       'bmi_cat_gte40', 'bmi_cat_lt40', 'albumin_cat_0', 'albumin_cat_gt3.5',
       'albumin_cat_lt3.5'],
      dtype='object')

【问题讨论】：

你能把输出添加到new_df.columns。这可能是其中一列名称的拼写错误是的，我现在将其添加到原始帖子中 【参考方案1】：

你写的：

['Age','Sex','SMOKE', 'DIABETES_NO', 'DIABETES_INSULIN', 'DIABETES_NON-INSULIN', 'bmi_cat_gte40',
    'bmi_catlt40',  'bmi_cat0',  'albumin_catgt3.5',  'albumin_catlt3.5',  'ablumin_cat0']

您要获取的列的实际名称：

['Age', 'Sex', 'SMOKE', 'DIABETES_NO', 'DIABETES_INSULIN', 'DIABETES_NON-INSULIN', 'bmi_cat_gte40',
    'bmi_cat_lt40', 'bmi_cat_0', 'albumin_cat_gt3.5', 'albumin_cat_lt3.5', 'albumin_cat_0']

尝试比较每个列表的第二行，您会发现一些差异。

【讨论】：

做到了！哦，天哪……我花了好几个小时盯着这段代码试图找出问题所在……你为此付出了很多，很高兴这是一个简单的修复！

以上是关于KeyError：“未在索引中找到”-创建的虚拟变量未在索引中找到的主要内容，如果未能解决你的问题，请参考以下文章