Spark MLlib速成宝典模型篇02逻辑斯谛回归Logistic回归(Python版)
Posted 黎明程序员
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark MLlib速成宝典模型篇02逻辑斯谛回归Logistic回归(Python版)相关的知识,希望对你有一定的参考价值。
目录
Logistic回归原理
Logistic回归代码(Spark Python)
Logistic回归原理 |
详见博文:http://www.cnblogs.com/itmorn/p/7890468.html
Logistic回归代码(Spark Python) |
代码里数据:https://pan.baidu.com/s/1jHWKG4I 密码:acq1
# -*-coding=utf-8 -*- from pyspark import SparkConf, SparkContext sc = SparkContext(\'local\') from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel from pyspark.mllib.regression import LabeledPoint # Load and parse the data 加载和解析数据,将每一个数转化为浮点数。每一行第一个数作为标记,后面的作为特征 def parsePoint(line): values = [float(x) for x in line.split(\' \')] return LabeledPoint(values[0], values[1:]) data = sc.textFile("data/mllib/sample_svm_data.txt") print data.collect()[0] #1 0 2.52078447201548 0 0 0 2.004684436494304 2.00034729926846..... parsedData = data.map(parsePoint) print parsedData.collect()[0] #(1.0,[0.0,2.52078447202,0.0,0.0,0.0,2.00468.... # Build the model 建立模型 model = LogisticRegressionWithLBFGS.train(parsedData) # Evaluating the model on training data 评估模型在训练集上的误差 labelsAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features))) trainErr = labelsAndPreds.filter(lambda lp: lp[0] != lp[1]).count() / float(parsedData.count()) print("Training Error = " + str(trainErr)) #Training Error = 0.366459627329 # Save and load model 保存模型和加载模型 model.save(sc, "pythonLogisticRegressionWithLBFGSModel") sameModel = LogisticRegressionModel.load(sc,"pythonLogisticRegressionWithLBFGSModel") print sameModel.predict(parsedData.collect()[0].features) #1
以上是关于Spark MLlib速成宝典模型篇02逻辑斯谛回归Logistic回归(Python版)的主要内容,如果未能解决你的问题,请参考以下文章
Spark MLlib速成宝典模型篇05决策树Decision Tree(Python版)
Spark MLlib速成宝典模型篇04朴素贝叶斯Naive Bayes(Python版)
Spark MLlib速成宝典模型篇07梯度提升树Gradient-Boosted Trees(Python版)
Spark MLlib速成宝典基础篇01Windows下spark开发环境搭建(Scala版)