无法使用 Sklearn 和 Keras Wrappers 使 pipeline.fit() 工作
Posted
技术标签:
【中文标题】无法使用 Sklearn 和 Keras Wrappers 使 pipeline.fit() 工作【英文标题】:Unable to get pipeline.fit() to work using Sklearn and Keras Wrappers 【发布时间】:2018-11-03 23:12:16 【问题描述】:我收到参数值错误(不足以解压预期的 2 得到 1)我有一个要训练的网络:
def build(self):
numpy.random.seed(self.seed)
self.estimators.append(('standardize', StandardScaler))
self.estimators.append(('mlp', KerasClassifier(build_fn=self.build_fn, epochs=50, batch_size=5, verbose=0)))
self.pipeline = Pipeline(self.estimators)
现在,如果我想将数据拟合到某些值:比如 self.X、self.Y
self.model = self.pipeline.fit(self.X, self.Y, verbose=1)
我明白了
Traceback (most recent call last):
File "C:/Users/jaehan/PycharmProjects/cerebro/cerebro.py", line 257, in
<module>
model.run()
File "C:/Users/jaehan/PycharmProjects/cerebro/cerebro.py", line 138, in run
self.model = self.pipeline.fit(self.X, self.Y, verbose=1)
File "C:\Users\jaehan\AppData\Local\Continuum\anaconda3\envs\py36\lib\site-
packages\sklearn\pipeline.py", line 248, in fit
Xt, fit_params = self._fit(X, y, **fit_params)
File "C:\Users\jaehan\AppData\Local\Continuum\anaconda3\envs\py36\lib\site-
packages\sklearn\pipeline.py", line 197, in _fit
step, param = pname.split('__', 1)
ValueError: not enough values to unpack (expected 2, got 1)
我在这里做错了吗?我的印象是我可以运行一次 fit 并且它会返回一个历史对象,我可以随时保存和加载它
我什至试过...
self.pipeline.fit(self.X, self.Y)
哪个抛出...
AttributeError: 'numpy.ndarray' object has no attribute 'fit'
我不知道这里发生了什么。
完整代码
class Cerebro:
def __init__(self):
self.model = None
self.build_fn = None
self.data = None
self.X = None
self.Y = None
#these three are for encoding string values to integer_encodings / one hot encodings
self.encoder = LabelEncoder()
self.encodings =
self.one_hot_encodings =
self.seed = numpy.random.seed(7) #this is to ensure we have reproducible results.
self.estimators = []
self.pipeline = None
self.kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=self.seed)
self.cross_validation_score = 0.0
def preprocess(self):
"""
This method will preprocess the dataset we want to train our network on.
Example:
import preproccessing
...
dataset, X, Y = preprocessing.main()
"""
self.data = pandas.read_csv('src_examples/hwtxn_final_for_influx.txt', sep='\t').values
self.X = numpy.delete(self.data, 13, axis=1)
self.Y = self.data[:, 13].astype(numpy.float16)
def build(self):
self.build_fn = self.base_model()
self.preprocess()
numpy.random.seed(self.seed)
self.estimators.append(('standardize', StandardScaler()))
self.estimators.append(('mlp', KerasClassifier(build_fn=self.build_fn, epochs=50, batch_size=5, verbose=0)))
self.pipeline = Pipeline(self.estimators)
def run(self):
"""This will actually take the pipeline (preprocessing standardization, model)
and fit it to our dataset (X, Y) (We don't need test/train since we are using stratified k fold cross val.)
Args:
None
Returns:
None
"""
# this is the 'model'
# self.pipeline
print(type(self.pipeline))
print(self.X.shape)
self.model = self.pipeline.fit(self.X, self.Y)
def load(self, fn):
"""This will load a saved model (history object)
Args:
fn (filename): represents saved model file
Returns:
model (pkl object): represents model
"""
return pickle.load(open(fn, 'rb'))
def save(self, fn):
"""This will save a model (history object)
Args:
fn (filename): represents a filename to save the model as
Returns:
None
"""
pickle.dump(self.model, open(fn, 'wb'))
def encode(self, vals, key):
""" This method will encode a list of values and take a key (representing column name, or index) to save
in the class object (self.encodings)
This will help us keep track of encodings we have for values we need to translate/decipher.
Args:
vals(np.array): array of values to encode
key(str): str representing the key used to encode this particular set of values
Returns:
transformed values (np.array) representing the encoded versions of values
"""
# int encoding for non int values
self.encodings[key] = self.encoder.fit_transform(vals)
return self.encoder.fit_transform(vals)
def decoder(self, vals, key):
"""This method will decode the integer_encodings for class variables. It will take vals which
represents a list of values to decode (i.e. [1,2,3] -- [apple, pear, orange])
It will also take a key (since every decoding has a corresponding encoding) to find which encoding
scheme to map to
Args:
vals(np.array) : array of values to decode
key(str) : string representing the key used for encoding the values (for decoding it)
Returns:
inverse transform of encoded values (np.array)
"""
# translate int encodings to original values (encoder._classes)
return self.encodings[key].inverse_transform(vals)
def cross_validate(self):
"""
This will perform a cross validation score using a stratified kfold method. (Think traditional Kfold but
with the values evenly distributed for each subsample)
Args:
None
Returns:
None
"""
self.cross_validation_score = cross_val_score(self.pipeline, self.X, self.Y, cv=self.kfold)
return self.cross_validation_score
@staticmethod
def base_model():
"""
This will return a base model for us to try. The good thing about this implementation is that
when we decide we want something more complex then all we have to do is define a class function and replace
the values in the build f(x)
Args:
None
Returns:
model (keras.models.Sequential): Keras based DNN Model
"""
# create model
model = Sequential()
model.add(Dense(60, input_dim=60, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
@staticmethod
def one_hot_encoder(int_encoding):
"""
This will take an integer encoding of string variables (traditional preprocessing step, will probably
move this to the preprocessing package.
Essential it returns a binary 'one hot' encoding of the values we wish to encode
Example
#Dataset Values
[apple, orange, pear]
#Integer Encoding
[1, 2, 3]
#One Hot Encoding
[[1, 0, 0]
[0, 1, 0]
[0, 0, 1]]
Args:
None
Returns:
Matrix (np.array): matrix representing one hot vectors for a class of values
"""
# we might not need this... so for now we will keep it static
return OneHotEncoder(sparse=False).fit_transform(int_encoding.reshape(len(int_encoding), 1))
if __name__ == '__main__':
# Step 1 is to initialize class (with seed == 7)
model = Cerebro()
model.build()
model.cross_validate()
print("Here are our estimators:\n ".format(model.estimators))
print("Here is our pipeline:\n ".format(model.pipeline))
model.run()
编辑 答案是 .fit() build_fn 参数需要一个函数指针,而不是模型本身。
恕我直言我觉得应该针对这种情况抛出一个错误。
【问题讨论】:
添加一些数据以便我们重现错误 self 应该专门作为类中的关键字。这真的很令人困惑。你能提供自我的属性吗?除了这与 sklearn-pandas 有什么关系? 最后一个错误“'numpy.ndarray' object has no attribute 'fit'
”表明您在某处更改了管道对象并为其分配了数据数组。显示完整代码
刚刚添加了完整代码@seralouk
对不起,我试图简洁。我应该提到这是一个类实现@Quickbeam2k1
【参考方案1】:
这是由于以下行:
self.build_fn = self.base_model()
这实际上应该是:
self.build_fn = self.base_model
KerasClassifier 需要一个指向创建模型的函数的指针,但通过在末尾附加()
,您将build_fn
分配给实际模型,这是错误的。
现在除了上述错误之外,我建议您检查代码中的以下几行,如果不更正,将来您将使用该代码时会出错。
1)self.encodings[key] = self.encoder.fit_transform(vals)
在这里,您将转换后的数据分配给encodings[key]
,而不是模型。所以当你这样做时:-
self.encodings[key].inverse_transform(vals)
在转换后的数据上调用inverse_transform()
是没有意义的。
inverse_transform()
是一种 scikit-learn 转换器的方法。但是self.encodings[key]
会给出一个ndarray,因为你已经保存了来自fit_transform()
的输出数组。
2) one_hot_encoder()
也发生类似于 2 的事情
错误"AttributeError: 'numpy.ndarray' object has no attribute 'fit'"
似乎与1和2有关。
【讨论】:
我想对这个答案投赞成票——但第 2) 和第 3) 点完全错过了问题的范围,并且会在编辑答案后立即投赞成票。 .fit() 的问题在于它与指向模型的 build_fn 参数有关,而不是与函数本身有关(如您所说)您提到的其他两点在此过程中根本没有被调用,并且与问题无关呈现(我知道我们需要在拟合之前进行预处理,这只是为了确保模型正确编译) @codebrotherone 我拒绝了您的编辑并自己编辑了答案以使其清楚。看看吧。 很酷听起来不错——在进行这些编辑时,我已经解决了编码问题,但我感谢您的慷慨!我会说这些错误不冲突或与最初使用 fit() 方法提出的问题有关,该方法不依赖于编码(直到数据开始处理)。这里的全部内容是确保它能够编译,但我会继续并将其标记为可接受的答案尽管如此感谢您的帮助! :)以上是关于无法使用 Sklearn 和 Keras Wrappers 使 pipeline.fit() 工作的主要内容,如果未能解决你的问题,请参考以下文章
如何使用 mlflow.pyfunc.log_model() 通过 Keras 步骤记录 sklearn 管道?类型错误:无法腌制 _thread.RLock 对象
使用 Keras 和 sklearn GridSearchCV 交叉验证提前停止
F1 比在 keras 回调中使用 sklearn 的准确率更高。有问题?
在使用 sklearn 和 keras 构建 CNN 时需要帮助理解形状错误吗?
TensorFlow 低级模型(没有 Keras 和 Sklearn) - 每一步都获得损失 = 0 和准确度 = 100%