斯科特是谁? - Seaborn pairplot 中的 ValueError:无法将字符串转换为浮点数:'scott'
Posted
技术标签:
【中文标题】斯科特是谁? - Seaborn pairplot 中的 ValueError:无法将字符串转换为浮点数:\'scott\'【英文标题】:Who is Scott? - ValueError in Seaborn pairplot: Could not convert string to float: 'scott'斯科特是谁? - Seaborn pairplot 中的 ValueError:无法将字符串转换为浮点数:'scott' 【发布时间】:2020-08-09 21:42:26 【问题描述】:斯科特是谁?
问题
尝试使用 seaborn 将贷款预测数据集中的 Education 属性添加到 pairplot 时出现以下错误:
ValueError Traceback(最近一次调用最后一次) ~/anaconda3/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid) 450 尝试: --> 451 体重 = 浮动(体重) 452 除了:
ValueError: 无法将字符串转换为浮点数:'scott'
我查看了原始数据,但在任何地方都找不到“scott”,所以我的问题是这是从哪里来的,我该如何解决?
我还收到一个运行时错误“RuntimeError: Selected KDE bandwidth is 0. Cannot estiamte density.”。我不确定这是由第一个错误引起的,还是完全是一个单独的问题。如果有人能对此有所了解,我将不胜感激。
数据集
我正在使用here 找到的贷款预测数据集。属性如下:
Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status
0 LP001002 Male No 0 Graduate No 5849 0.0 NaN 360.0 1.0 Urban Y
1 LP001003 Male Yes 1 Graduate No 4583 1508.0 128.0 360.0 1.0 Rural N
2 LP001005 Male Yes 0 Graduate Yes 3000 0.0 66.0 360.0 1.0 Urban Y
3 LP001006 Male Yes 0 Not Graduate No 2583 2358.0 120.0 360.0 1.0 Urban Y
4 LP001008 Male No 0 Graduate No 6000 0.0 141.0 360.0 1.0 Urban Y
代码
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline # I'm using ipython notebook
train_data = pd.read_csv("train_ctrUa4K.csv")
bad_credit = train_data[train_data["Credit_History"] == 0]
bad_credit["Education"] = bad_credit["Education"].map("Graduate":1,"Not Graduate":0)
sns.pairplot(bad_credit,vars=["ApplicantIncome","Education","LoanAmount"],hue="Loan_Status")
错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid)
450 try:
--> 451 bw = float(bw)
452 except:
ValueError: could not convert string to float: 'scott'
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-25-0cd48ab0d803> in <module>
2 bad_credit = train_data[train_data["Credit_History"] == 0]
3 bad_credit["Education"] = bad_credit["Education"].map("Graduate":1,"Not Graduate":0)
----> 4 sns.pairplot(bad_credit,vars=["ApplicantIncome","Education","LoanAmount"],hue="Loan_Status")
~/anaconda3/lib/python3.7/site-packages/seaborn/axisgrid.py in pairplot(data, hue, hue_order, palette, vars, x_vars, y_vars, kind, diag_kind, markers, height, aspect, corner, dropna, plot_kws, diag_kws, grid_kws, size)
2119 diag_kws.setdefault("shade", True)
2120 diag_kws["legend"] = False
-> 2121 grid.map_diag(kdeplot, **diag_kws)
2122
2123 # Maybe plot on the off-diagonals
~/anaconda3/lib/python3.7/site-packages/seaborn/axisgrid.py in map_diag(self, func, **kwargs)
1488 data_k = utils.remove_na(data_k)
1489
-> 1490 func(data_k, label=label_k, color=color, **kwargs)
1491
1492 self._clean_axis(ax)
~/anaconda3/lib/python3.7/site-packages/seaborn/distributions.py in kdeplot(data, data2, shade, vertical, kernel, bw, gridsize, cut, clip, legend, cumulative, shade_lowest, cbar, cbar_ax, cbar_kws, ax, **kwargs)
703 ax = _univariate_kdeplot(data, shade, vertical, kernel, bw,
704 gridsize, cut, clip, legend, ax,
--> 705 cumulative=cumulative, **kwargs)
706
707 return ax
~/anaconda3/lib/python3.7/site-packages/seaborn/distributions.py in _univariate_kdeplot(data, shade, vertical, kernel, bw, gridsize, cut, clip, legend, ax, cumulative, **kwargs)
293 x, y = _statsmodels_univariate_kde(data, kernel, bw,
294 gridsize, cut, clip,
--> 295 cumulative=cumulative)
296 else:
297 # Fall back to scipy if missing statsmodels
~/anaconda3/lib/python3.7/site-packages/seaborn/distributions.py in _statsmodels_univariate_kde(data, kernel, bw, gridsize, cut, clip, cumulative)
365 fft = kernel == "gau"
366 kde = smnp.KDEUnivariate(data)
--> 367 kde.fit(kernel, bw, fft, gridsize=gridsize, cut=cut, clip=clip)
368 if cumulative:
369 grid, y = kde.support, kde.cdf
~/anaconda3/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in fit(self, kernel, bw, fft, weights, gridsize, adjust, cut, clip)
138 density, grid, bw = kdensityfft(endog, kernel=kernel, bw=bw,
139 adjust=adjust, weights=weights, gridsize=gridsize,
--> 140 clip=clip, cut=cut)
141 else:
142 density, grid, bw = kdensity(endog, kernel=kernel, bw=bw,
~/anaconda3/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid)
451 bw = float(bw)
452 except:
--> 453 bw = bandwidths.select_bandwidth(X, bw, kern) # will cross-val fit this pattern?
454 bw *= adjust
455
~/anaconda3/lib/python3.7/site-packages/statsmodels/nonparametric/bandwidths.py in select_bandwidth(x, bw, kernel)
172 # eventually this can fall back on another selection criterion.
173 err = "Selected KDE bandwidth is 0. Cannot estiamte density."
--> 174 raise RuntimeError(err)
175 else:
176 return bandwidth
RuntimeError: Selected KDE bandwidth is 0. Cannot estiamte density.
【问题讨论】:
大卫斯科特的homepage 告诉你更多。 @JohanC 谢谢,不认识他,但他的简历很不错 【参考方案1】:scott
是在绘制核密度估计 (KDE) 时选择带宽的方法的名称。它以 DW Scott (1) 的名字命名。
我无法查看您的数据,但我的猜测是,对于某个色相级别的一对变量中的一个很奇怪,这会阻止 seaborn 计算正确的带宽。
您可以使用diag_kws
将参数传递给sns.kdeplot()
,pairplot 使用它来绘制对角线上的单变量分布。
例如:
sns.pairplot(..., diag_kws='bw':'silverman')
会强制sns.kdeplot()
使用“silverman”方法来选择带宽,在您的情况下这可能比 Scott 方法更好吗?
(1) Scott,“多元密度估计:理论、实践和可视化”,John Wiley & Sons,纽约,奇斯特,1992 年。
编辑
要尝试找出罪魁祸首,您必须使用PairGrid
而不是pairplot()
。 PairGrid
允许您使用自定义函数来绘制对角线。如果您在该函数中包含打印语句,您可以看到将传递给 sns.kdeplot() 的数据是什么。执行应该在数据“不正确”的地方停止,您也许可以弄清楚如何处理。
例如:
def test_func(*data, **kwargs):
print("data received:", data)
print("hue name + other params:", kwargs)
sns.kdeplot(*data, **kwargs)
iris = sns.load_dataset('iris')
g = sns.PairGrid(iris, hue="species")
g = g.map_diag(test_func)
对于每个变量(列)和每个级别,您都会得到如下所示的输出:
data received: (array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9, 5.4, 4.8, 4.8,
4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5. ,
5. , 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5. , 5.5, 4.9, 4.4,
5.1, 5. , 4.5, 4.4, 5. , 5.1, 4.8, 5.1, 4.6, 5.3, 5. ]),)
hue name + other params: 'label': 'setosa', 'color': (0.12156862745098039, 0.4666666666666667, 0.7058823529411765)
data received: (array([7. , 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5. , 5.9, 6. ,
6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6,
6.8, 6.7, 6. , 5.7, 5.5, 5.5, 5.8, 6. , 5.4, 6. , 6.7, 6.3, 5.6,
5.5, 5.5, 6.1, 5.8, 5. , 5.6, 5.7, 5.7, 6.2, 5.1, 5.7]),)
hue name + other params: 'label': 'versicolor', 'color': (1.0, 0.4980392156862745, 0.054901960784313725)
(...)
【讨论】:
感谢您的解释!我尝试按照您的建议传递“silverman”,但得到了相同的 ValueError,但现在用“silverman”代替了 scott。我现在用“1.0”替换了它,令我惊讶的是,这确实产生了一些有利的结果,但我不确定为什么。有没有办法可以追踪/找到导致无法计算带宽的对? 我添加了一种方法来尝试追踪问题发生的位置以上是关于斯科特是谁? - Seaborn pairplot 中的 ValueError:无法将字符串转换为浮点数:'scott'的主要内容,如果未能解决你的问题,请参考以下文章
如何更改 seaborn pairplot 中仅 x 或 y 标签的字体大小?
Python使用matplotlib可视化分组多变量两两关系图使用seaborn中的pairplot函数可视化分组多变量两两关系图对角线为分组密度图其它图像为分组两两散点图