无法使用网格搜索检索 bestModel

Posted 2023-04-15

技术标签:

【中文标题】无法使用网格搜索检索 bestModel【英文标题】：Unable to retrieve bestModel using Grid search 【发布时间】：2020-04-16 03:52:59 【问题描述】：

我正在使用下面的代码来获得回归模型的最佳拟合并得到一个错误：

# Creating parameter grid
params = ParamGridBuilder()

# Adding grids for two parameters
params = params.addGrid(regression.regParam, [0.01, 0.1, 1.0, 10.0]) \
               .addGrid(regression.elasticNetParam, [0.0, 0.5, 1.0])

# Building the parameter grid
params = params.build()
print('Number of models to be tested: ', len(params))

# Creating cross-validator
cv = CrossValidator(estimator=pipeline, estimatorParamMaps=params, evaluator=evaluator, numFolds=5)

from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit, CrossValidator
from pyspark.ml.evaluation import BinaryClassificationEvaluator

# Get the best model from cross validation
best_model = cv.bestModel

错误是：

AttributeError                            Traceback (most recent 
call last)
<ipython-input-449-f7d43e2cf76b> in <module>
  3 
  4 # Get the best model from cross validation
 ----> 5 best_model = cv.bestModel
  6 
  7 # Look at the stages in the best model

AttributeError: 'CrossValidator' object has no attribute 'bestModel'

用于获取最佳模型参数的 CrossValidator 没有返回经过训练的模型！！

【问题讨论】：

【参考方案1】：

在访问bestModel 属性之前，您必须先拟合并分配 CV 模型；改编docs中的示例：

from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.linalg import Vectors
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder

dataset = spark.createDataFrame(
    [(Vectors.dense([0.0]), 0.0),
     (Vectors.dense([0.4]), 1.0),
     (Vectors.dense([0.5]), 0.0),
     (Vectors.dense([0.6]), 1.0),
     (Vectors.dense([1.0]), 1.0)] * 10,
    ["features", "label"])

lr = LogisticRegression()
grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1]).build()
evaluator = BinaryClassificationEvaluator()
cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator,
    parallelism=2)

在这个阶段请求cv.bestModel会得到

AttributeError                            Traceback (most recent call last)
<command-388275196191991> in <module>
----> 1 cv.bestModel

AttributeError: 'CrossValidator' object has no attribute 'bestModel'

就像你的情况。

先拟合并分配：

cvModel = cv.fit(dataset)
cvModel.bestModel
# result:
LogisticRegressionModel: uid = LogisticRegression_f9c9ea282e32, numClasses = 2, numFeatures = 1

【讨论】：

以上是关于无法使用网格搜索检索 bestModel的主要内容，如果未能解决你的问题，请参考以下文章