在 Jupyter Notebook 中执行高斯朴素贝叶斯时出错

Posted 2023-03-12

技术标签:

【中文标题】在 Jupyter Notebook 中执行高斯朴素贝叶斯时出错【英文标题】：Error while doing Gaussian Naive Bayes in Jupyter Notebook 【发布时间】：2021-11-14 05:23:01 【问题描述】：

我目前正在 udacity 学习“机器学习入门”免费课程，其中有一个关于高斯朴素贝叶斯的测验。在 udacity 环境中运行时，代码给出了所需的输出（如下图所示） Code output in udacity environment

但是当我在 jupyter notebook 中运行它时显示错误，对于模块 class_vis.py 它显示错误'NoneType' object has no attribute 'predict'（如下图所示） Error in jupyter notebook

这是所有模块的代码：-

studentMain.py

    Naive Bayes classifier to classify the terrain data.
    
    The objective of this exercise is to recreate the decision 
    boundary found in the lesson video, and make a plot that
    visually shows the decision boundary """


from prep_terrain_data import makeTerrainData
from class_vis import prettyPicture, output_image
from ClassifyNB import classify

import numpy as np
import pylab as pl


features_train, labels_train, features_test, labels_test = makeTerrainData()

### the training data (features_train, labels_train) have both "fast" and "slow" points mixed
### in together--separate them so we can give them different colors in the scatterplot,
### and visually identify them
grade_fast = [features_train[ii][0] for ii in range(0, len(features_train)) if labels_train[ii]==0]
bumpy_fast = [features_train[ii][1] for ii in range(0, len(features_train)) if labels_train[ii]==0]
grade_slow = [features_train[ii][0] for ii in range(0, len(features_train)) if labels_train[ii]==1]
bumpy_slow = [features_train[ii][1] for ii in range(0, len(features_train)) if labels_train[ii]==1]


# You will need to complete this function imported from the ClassifyNB script.
# Be sure to change to that code tab to complete this quiz.
clf = classify(features_train, labels_train)
### draw the decision boundary with the text points overlaid
prettyPicture(clf, features_test, labels_test)
output_image("test.png", "png", open("test.png", "rb").read())

class_vis.py

#from udacityplots import *
import warnings
warnings.filterwarnings("ignore")

import matplotlib 
matplotlib.use('agg')

import matplotlib.pyplot as plt
import pylab as pl
import numpy as np

#import numpy as np
#import matplotlib.pyplot as plt
#plt.ioff()

def prettyPicture(clf, X_test, y_test):
    x_min = 0.0; x_max = 1.0
    y_min = 0.0; y_max = 1.0

    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, m_max]x[y_min, y_max].
    h = .01  # step size in the mesh
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())

    plt.pcolormesh(xx, yy, Z, cmap=pl.cm.seismic)

    # Plot also the test points
    grade_sig = [X_test[ii][0] for ii in range(0, len(X_test)) if y_test[ii]==0]
    bumpy_sig = [X_test[ii][1] for ii in range(0, len(X_test)) if y_test[ii]==0]
    grade_bkg = [X_test[ii][0] for ii in range(0, len(X_test)) if y_test[ii]==1]
    bumpy_bkg = [X_test[ii][1] for ii in range(0, len(X_test)) if y_test[ii]==1]

    plt.scatter(grade_sig, bumpy_sig, color = "b", label="fast")
    plt.scatter(grade_bkg, bumpy_bkg, color = "r", label="slow")
    plt.legend()
    plt.xlabel("bumpiness")
    plt.ylabel("grade")

    plt.savefig("test.png")
    
import base64
import json
import subprocess

def output_image(name, format, bytes):
    image_start = "BEGIN_IMAGE_f9825uweof8jw9fj4r8"
    image_end = "END_IMAGE_0238jfw08fjsiufhw8frs"
    data = 
    data['name'] = name
    data['format'] = format
    data['bytes'] = base64.encodestring(bytes)
    print (image_start+json.dumps(data)+image_end)

prep_terrain_data.py

#!/usr/bin/python
import random


def makeTerrainData(n_points=1000):
###############################################################################
### make the toy dataset
    random.seed(42)
    grade = [random.random() for ii in range(0,n_points)]
    bumpy = [random.random() for ii in range(0,n_points)]
    error = [random.random() for ii in range(0,n_points)]
    y = [round(grade[ii]*bumpy[ii]+0.3+0.1*error[ii]) for ii in range(0,n_points)]
    for ii in range(0, len(y)):
        if grade[ii]>0.8 or bumpy[ii]>0.8:
            y[ii] = 1.0

### split into train/test sets
    X = [[gg, ss] for gg, ss in zip(grade, bumpy)]
    split = int(0.75*n_points)
    X_train = X[0:split]
    X_test  = X[split:]
    y_train = y[0:split]
    y_test  = y[split:]

    grade_sig = [X_train[ii][0] for ii in range(0, len(X_train)) if y_train[ii]==0]
    bumpy_sig = [X_train[ii][1] for ii in range(0, len(X_train)) if y_train[ii]==0]
    grade_bkg = [X_train[ii][0] for ii in range(0, len(X_train)) if y_train[ii]==1]
    bumpy_bkg = [X_train[ii][1] for ii in range(0, len(X_train)) if y_train[ii]==1]

#    training_data = "fast":"grade":grade_sig, "bumpiness":bumpy_sig
#            , "slow":"grade":grade_bkg, "bumpiness":bumpy_bkg


    grade_sig = [X_test[ii][0] for ii in range(0, len(X_test)) if y_test[ii]==0]
    bumpy_sig = [X_test[ii][1] for ii in range(0, len(X_test)) if y_test[ii]==0]
    grade_bkg = [X_test[ii][0] for ii in range(0, len(X_test)) if y_test[ii]==1]
    bumpy_bkg = [X_test[ii][1] for ii in range(0, len(X_test)) if y_test[ii]==1]

    test_data = "fast":"grade":grade_sig, "bumpiness":bumpy_sig
            , "slow":"grade":grade_bkg, "bumpiness":bumpy_bkg

    return X_train, y_train, X_test, y_test
#    return training_data, test_data

分类NB.py

#ClassifyNB.py
def classify(features_train, labels_train):   
    ### import the sklearn module for GaussianNB
    ### create classifier
    ### fit the classifier on the training features and labels
    ### return the fit classifier
    
    
    ### your code goes here!
    from sklearn.naive_bayes import GaussianNB
    clf = GaussianNB()
    clf.fit(features_train,labels_train)

请帮我看看是什么错误

【问题讨论】：

【参考方案1】：

据我所知，您的分类函数没有返回任何内容，但是您将它的返回值分配给一个变量，该变量将根据 python 标准将其设置为None。要解决此问题，请在分类函数处插入一个 return 语句：

def classify(features_train, labels_train):   
    ### import the sklearn module for GaussianNB
    ### create classifier
    ### fit the classifier on the training features and labels
    ### return the fit classifier
    
    
    ### your code goes here!
    from sklearn.naive_bayes import GaussianNB
    clf = GaussianNB()
    clf.fit(features_train,labels_train)
    return clf

【讨论】：

即使在编写插入函数后它仍然显示相同的错误，根据错误声明（i.stack.imgur.com/J8k2p.png），我认为它显示了 class_vis.py 模块的错误再次检查，除了 output_image 函数外，它对我有用。如果你想在 notebook 中看到结果，不要忘记将 %matplotlib inline 放在 jupyter 的开头。你也应该从prep_terrain_data.py中删除shebang 非常感谢，我得到了输出。 class_vis.py 中的一些语句是作为注释编写的，在将它们更改为代码并删除 shebang 之后，我得到了输出，虽然我也得到了一些错误以及如图所示的输出（i.stack.imgur.com/Q7epK.png），你能告诉我吗我为什么会收到这些错误我不确定你在 outputImage 中做了什么。 json.dumps() 部分似乎引发了错误。你可以把这个函数全部删除，代码就会按预期工作。

以上是关于在 Jupyter Notebook 中执行高斯朴素贝叶斯时出错的主要内容，如果未能解决你的问题，请参考以下文章