无法使用网格搜索检索 bestModel
Posted
技术标签:
【中文标题】无法使用网格搜索检索 bestModel【英文标题】:Unable to retrieve bestModel using Grid search 【发布时间】:2020-04-16 03:52:59 【问题描述】:我正在使用下面的代码来获得回归模型的最佳拟合并得到一个错误:
# Creating parameter grid
params = ParamGridBuilder()
# Adding grids for two parameters
params = params.addGrid(regression.regParam, [0.01, 0.1, 1.0, 10.0]) \
.addGrid(regression.elasticNetParam, [0.0, 0.5, 1.0])
# Building the parameter grid
params = params.build()
print('Number of models to be tested: ', len(params))
# Creating cross-validator
cv = CrossValidator(estimator=pipeline, estimatorParamMaps=params, evaluator=evaluator, numFolds=5)
from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit, CrossValidator
from pyspark.ml.evaluation import BinaryClassificationEvaluator
# Get the best model from cross validation
best_model = cv.bestModel
错误是:
AttributeError Traceback (most recent
call last)
<ipython-input-449-f7d43e2cf76b> in <module>
3
4 # Get the best model from cross validation
----> 5 best_model = cv.bestModel
6
7 # Look at the stages in the best model
AttributeError: 'CrossValidator' object has no attribute 'bestModel'
用于获取最佳模型参数的 CrossValidator 没有返回经过训练的模型!!
【问题讨论】:
【参考方案1】:在访问bestModel
属性之前,您必须先拟合并分配 CV 模型;改编docs中的示例:
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.linalg import Vectors
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
dataset = spark.createDataFrame(
[(Vectors.dense([0.0]), 0.0),
(Vectors.dense([0.4]), 1.0),
(Vectors.dense([0.5]), 0.0),
(Vectors.dense([0.6]), 1.0),
(Vectors.dense([1.0]), 1.0)] * 10,
["features", "label"])
lr = LogisticRegression()
grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1]).build()
evaluator = BinaryClassificationEvaluator()
cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator,
parallelism=2)
在这个阶段请求cv.bestModel
会得到
AttributeError Traceback (most recent call last)
<command-388275196191991> in <module>
----> 1 cv.bestModel
AttributeError: 'CrossValidator' object has no attribute 'bestModel'
就像你的情况。
先拟合并分配:
cvModel = cv.fit(dataset)
cvModel.bestModel
# result:
LogisticRegressionModel: uid = LogisticRegression_f9c9ea282e32, numClasses = 2, numFeatures = 1
【讨论】:
以上是关于无法使用网格搜索检索 bestModel的主要内容,如果未能解决你的问题,请参考以下文章