决策树分类器

Posted 2021-06-16 Plum_Brilliant

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了决策树分类器相关的知识，希望对你有一定的参考价值。

from pyspark.ml.classification import DecisionTreeClassificationModel
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml import Pipeline,PipelineModel
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.linalg import Vector,Vectors
from pyspark.sql import Row
from pyspark.ml.feature import IndexToString,StringIndexer,VectorIndexer

def f(x):
rel = {}
rel[\'features\']=Vectors. \\
dense(str(x[2]),str(x[24]),str(x[28]),str(x[29]))
rel[\'label\'] = str(x[22])
return rel
data = spark.sparkContext.textFile("file:///home/hw17685187119/student2.txt").map(lambda line: line.split(\';\')).map(lambda p: Row(**f(p))).toDF()

labelIndexer = StringIndexer().setInputCol("label").setOutputCol("indexedLabel").fit(data)
featureIndexer = VectorIndexer().setInputCol("features").setOutputCol("indexedFeatures").setMaxCategories(4).fit(data)
labelConverter = IndexToString().setInputCol("prediction").setOutputCol("predictedLabel").setLabels(labelIndexer.labels)
trainingData, testData = data.randomSplit([0.7, 0.3])

dtClassifier = DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures")

dtPipeline = Pipeline().setStages([labelIndexer, featureIndexer, dtClassifier, labelConverter])
dtPipelineModel = dtPipeline.fit(trainingData)
dtPredictions = dtPipelineModel.transform(testData)
dtPredictions.select("predictedLabel", "label", "features").show(20)

evaluator = MulticlassClassificationEvaluator().setLabelCol("indexedLabel").setPredictionCol("prediction")
dtAccuracy = evaluator.evaluate(dtPredictions)
dtAccuracy

以上是关于决策树分类器的主要内容，如果未能解决你的问题，请参考以下文章