如何在 scikit-learn 管道中将时代添加到 Keras 网络
Posted
技术标签:
【中文标题】如何在 scikit-learn 管道中将时代添加到 Keras 网络【英文标题】:How to add epochs to Keras network in scikit-learn pipeline 【发布时间】:2020-07-11 14:08:33 【问题描述】:我正在使用该网站的代码来帮助我分析推文,它使用的是管道: https://www.dataquest.io/blog/tutorial-text-classification-in-python-using-spacy/
# Create our list of punctuation marks
punctuations = string.punctuation
# Create our list of stopwords
nlp = spacy.load('en')
stop_words = spacy.lang.en.stop_words.STOP_WORDS
# Load English tokenizer, tagger, parser, NER and word vectors
parser = English()
# Creating our tokenizer function
def spacy_tokenizer(sentence):
# Creating our token object, which is used to create documents with linguistic annotations.
mytokens = parser(sentence)
# Lemmatizing each token and converting each token into lowercase
mytokens = [ word.lemma_.lower().strip() if word.lemma_ != "-PRON-" else word.lower_ for word in mytokens ]
# Removing stop words
mytokens = [ word for word in mytokens if word not in stop_words and word not in punctuations ]
# return preprocessed list of tokens
return mytokens
# Custom transformer using spaCy
class predictors(TransformerMixin):
def transform(self, X, **transform_params):
# Cleaning Text
return [clean_text(text) for text in X]
def fit(self, X, y=None, **fit_params):
return self
def get_params(self, deep=True):
return
# Basic function to clean the text
def clean_text(text):
# Removing spaces and converting text into lowercase
return text.strip().lower()
bow_vector = CountVectorizer(tokenizer = spacy_tokenizer, ngram_range=(1,1))
x = tweets['text']
Y = tweets['target']
x_train, x_test, Y_train, Y_test = model_selection.train_test_split(x, Y, test_size = 0.2)
#This part I figured out on my own:
from keras import Sequential
from keras.layers import Dense
classifier = Sequential()
#First Hidden Layer
classifier.add(Dense(500, activation='relu', kernel_initializer='random_normal', input_dim=19080))
#Second Hidden Layer
classifier.add(Dense(500, activation='relu', kernel_initializer='random_normal'))
#Output Layer
classifier.add(Dense(1, activation='sigmoid', kernel_initializer='random_normal'))
classifier.compile(optimizer ='adam',loss='binary_crossentropy', metrics =['accuracy'])
# Create pipeline using Bag of Words
pipe = Pipeline([("cleaner", predictors()),
('vectorizer', bow_vector),
('classifier', classifier)])
# model generation
pipe.fit(x_train, Y_train)
我的问题是,我想这样做:
classifier.fit(X_train,y_train, batch_size=5, epochs=200)
但我似乎无法使其与管道一起使用。我可以在没有它的情况下运行它,而且它只需要一个 epoch 就可以运行得很好。但我很确定我会通过更多的 epoch 获得更好的准确性。
【问题讨论】:
【参考方案1】:您应该使用 scikit-learn 包装器:
from keras.wrappers.scikit_learn import KerasClassifier
def create_network():
network = Sequential()
network.add(Dense(500, activation='relu', kernel_initializer='random_normal', input_dim=19080))
network.add(Dense(500, activation='relu', kernel_initializer='random_normal'))
network.add(Dense(1, activation='sigmoid', kernel_initializer='random_normal'))
network.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
return network
classifier = KerasClassifier(build_fn=create_network,
epochs=10,
batch_size=100,
verbose=0)
并在您的管道中使用上面显示的classifier
,您可以在其中定义epochs
和batch_size
。
【讨论】:
以上是关于如何在 scikit-learn 管道中将时代添加到 Keras 网络的主要内容,如果未能解决你的问题,请参考以下文章
是否可以将 TransformedTargetRegressor 添加到 scikit-learn 管道中?
如何在 scikit-learn 管道中的 CountVectorizer 之前包含 SimpleImputer?
如何从 scikit-learn 中的 TransformedTargetRegressor 管道中的经过训练的估计器访问属性?