NBC朴素贝叶斯分类器 ————机器学习实战 python代码

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了NBC朴素贝叶斯分类器 ————机器学习实战 python代码相关的知识,希望对你有一定的参考价值。

# -*- coding: utf-8 -*-
"""
Created on Mon Aug 07 23:40:13 2017

@author: mdz
"""
import numpy as np
def loadData():
    vocabList=[[‘my‘, ‘dog‘, ‘has‘, ‘flea‘, ‘problems‘, ‘help‘, ‘please‘],
                 [‘maybe‘, ‘not‘, ‘take‘, ‘him‘, ‘to‘, ‘dog‘, ‘park‘, ‘stupid‘],
                 [‘my‘, ‘dalmation‘, ‘is‘, ‘so‘, ‘cute‘, ‘I‘, ‘love‘, ‘him‘],
                 [‘stop‘, ‘posting‘, ‘stupid‘, ‘worthless‘, ‘garbage‘],
                 [‘mr‘, ‘licks‘, ‘ate‘, ‘my‘, ‘steak‘, ‘how‘, ‘to‘, ‘stop‘, ‘him‘],
                 [‘quit‘, ‘buying‘, ‘worthless‘, ‘dog‘, ‘food‘, ‘stupid‘]]
    classList=[0,1,0,1,0,1]#1 侮辱性文字,0 正常言论
    return vocabList,classList

#对vocabList已经拆分过的句子进行筛选,筛选掉重复的单词,最后再返回list
#该list的length即属性的个数
def filterVocabList(vocabList):
    vocabSet=set([])
    for document in vocabList:
        vocabSet=vocabSet|set(document)
    return list(vocabSet)    

#对测试样本进行0-1处理
def zero_one(vocabList,input):
    returnVec=[0]*len(vocabList)
    for word in input:
       if word in vocabList:
         returnVec[vocabList.index(word)]=1
       else:
         print "the word: %s is not in my Vocabulary!"%word
    return returnVec

def trainNbc(trainSamples,trainCategory):
    numTrainSamp=len(trainSamples)
    numWords=len(trainSamples[0])
    pAbusive=sum(trainCategory)/float(numTrainSamp)
    #y=1 or 0下的特征计数
    p0Num=np.ones(numWords)
    p1Num=np.ones(numWords)
    #y=1 or 0下的类别计数
    p0NumTotal=numWords
    p1NumTotal=numWords
    for i in range(numTrainSamp):
        if trainCategory[i]==1:
             p0Num+=trainSamples[i]
             p0NumTotal+=sum(trainSamples[i])
        else:
             p1Num+=trainSamples[i]
             p1NumTotal +=sum(trainSamples[i])
    p1Vec=np.log(p1Num/p1NumTotal)
    p0Vec=np.log(p0Num/p0NumTotal)
    return p1Vec,p0Vec,pAbusive

def classifyOfNbc(testSamples,p1Vec,p0Vec,pAbusive):
    p1=sum(testSamples*p1Vec)+np.log(pAbusive)
    p0=sum(testSamples*p0Vec)+np.log(1-pAbusive)
    if p1>p0:
        return 1
    else:
        return 0
def testingNbc():
    vocabList,classList=loadData()
    vocabSet=filterVocabList(vocabList)
    trainList=[]
    for term in vocabList:
        trainList.append(zero_one(vocabSet,term))
    p1Vec,p0Vec,pAbusive=trainNbc(np.array(trainList),np.array(classList))
    testEntry=[‘love‘,‘my‘,‘daughter‘]
    testSamples=np.array(zero_one(vocabSet,testEntry))
    print testEntry,‘classified as :‘,classifyOfNbc(testSamples,p0Vec,p1Vec,pAbusive)
    testEntry=[‘stupid‘,‘garbage‘]
    testSamples=np.array(zero_one(vocabSet,testEntry))
    print testEntry,‘classified as :‘,classifyOfNbc(testSamples,p0Vec,p1Vec,pAbusive)


       


 
    

  

以上是关于NBC朴素贝叶斯分类器 ————机器学习实战 python代码的主要内容,如果未能解决你的问题,请参考以下文章

机器学习之朴素贝叶斯分类器

朴素贝叶斯的应用

机器学习:朴素贝叶斯分类器实现二分类(伯努利型) 代码+项目实战

机器学习朴素贝叶斯分类器返回语句简介

《机器学习实战》程序清单4-2 朴素贝叶斯分类器训练函数

机器学习实战三(Naive Bayes)