itertools.combinations()似乎正在干扰训练循环
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了itertools.combinations()似乎正在干扰训练循环相关的知识,希望对你有一定的参考价值。
试图组合一个循环以对以下数据进行特征选择:
pid ms_subclass lot_frontage lot_area overall_qual
2126 907135180 20 60.0 0.992559 4
192 903206120 75 NaN 0.965733 7
2406 528181040 120 40.0 0.977838 7
45 528175010 120 44.0 0.974905 7
2477 531379030 60 70.0 0.972883 6
overall_cond gr_liv_area full_bath half_bath bedroom_abvgr
2126 5 0.121764 1 0 3
192 7 0.256273 1 1 3
2406 5 0.196950 2 0 2
45 5 0.207804 2 0 2
2477 5 0.215220 2 1 3
kitchen_abvgr fireplaces garage_area wood_deck_sf open_porch_sf
2126 1 0 0.000000 0.000000 0.000000
192 1 1 0.039036 0.000000 0.000000
2406 1 1 0.068241 0.019004 0.005039
45 1 1 0.074063 0.029380 0.005356
2477 1 0 0.080605 0.017574 0.019331
enclosed_porch 3ssn_porch screen_porch pool_area d_a_(agr)
2126 0.000000 0 0 0 0
192 0.007435 0 0 0 0
2406 0.000000 0 0 0 0
45 0.000000 0 0 0 0
2477 0.000000 0 0 0 0
d_c_(all) d_fv d_i_(all) d_rh d_rl d_rm d_ir1 d_ir2 d_ir3
2126 0 0 0 0 1 0 0 0 0
192 0 0 0 0 1 0 1 0 0
2406 0 0 0 0 1 0 1 0 0
45 0 0 0 0 1 0 1 0 0
2477 0 0 0 0 1 0 1 0 0
想法是获取大小为j
的特征的所有可能组合,然后在每个特征上运行以下训练循环:
# data pre-processing
train = data.sample(frac=0.8, axis=0, random_state=1)
test = data.drop(train.index)
n_epochs = 15
batch_size = 50
X_train = transform_features(train) ; X_train_pandas = transform_features(train)
X_test = transform_features(test)
y_train = X_train.pop("sale_price") ; y_train_pandas = X_train_pandas.pop("sale_price")
y_test = X_test.pop("sale_price")
def hypothesis(X, W, b):
return tf.tensordot(X, W, axes=1) + b
def mean_squared_error(y, y_pred):
return tf.reduce_mean(tf.square(y_pred - y))
def d_mean_squared_error(y, y_pred):
return tf.reshape(tf.reduce_mean(2 * (y_pred - y)), [1, 1])
def training_loop(X_, y_, n_epochs_=15, batch_size_=100, learning_rate_=0.001):
n = X_.shape[0]
n_features = X_.shape[1]
W = tf.random.normal((n_features, 1))
b = 0
epochs = []
training_losses = []
# build model components
X_train = tf.constant(X_.values, dtype=tf.float32)
y_train = tf.constant(y_.values, dtype=tf.float32)
# initialize TF dataset object
d = tf.data.Dataset.from_tensor_slices((X_train, y_train))
d.shuffle(len(X_train)).repeat(n_epochs).batch(batch_size)
iterator = tf.compat.v1.data.make_one_shot_iterator(d)
for i in range(n_epochs):
epoch_losses = []
for batch in range(n // batch_size):
X_batch, y_batch = iterator.get_next()
y_pred = hypothesis(X_batch, W, b)
batch_loss = mean_squared_error(y_batch, y_pred)
epoch_losses.append(batch_loss.numpy())
dL_dH = d_mean_squared_error(y_batch, y_pred)
dH_dW = X_batch
dL_dW = tf.reduce_mean(dL_dH * dH_dW)
dL_dB = tf.reduce_mean(dL_dH)
W -= (learning_rate_ * dL_dW)
b -= (learning_rate_ * dL_dB)
loss = np.mean(epoch_losses)
epochs.append(i)
training_losses.append(loss)
# give final error score as RMSE
return np.sqrt(np.float32(loss))
这是我正在进行的功能选择过程的代码:
# number of features desired in model
k = 20
# get all usable features
allFeatures = [ f for f in X_train_pandas.columns if (f != "sale_price") & (f != "pid") ]
# record best mse score among each feature-set size j
j_losses = []
for j in range(2, k):
print(f"CURRENTLY ON SIZE {j}")
# generate list of all possible combinations of features
possible_fsets = list(combinations(allFeatures, j))
# record losses for each feature-set for j
fset_losses = []
# generate mse for each possible combination of features
for fset in possible_fsets:
fset = list(fset)
fset_info = []
fset_loss = training_loop(X_train[fset], y_train)
print(f"fset: {fset}")
print(f"loss: {fset_loss}")
fset_info.append(fset)
fset_info.append(fset_loss)
fset_losses.append(fset_info)
f_losses = pd.DataFrame.from_records(feature_losses, columns=["feature_set", "mse_loss"])
f_losses.sort_values("mse_loss", inplace=True)
print(f_losses.head())
best_loss = []
best_loss.append(f_losses["feature_set"].iloc[0])
best_loss.append(f_losses["mse_loss"].iloc[0])
j_losses.append(best_loss)
print(j_losses)
培训循环似乎可以单独很好地工作;当我手动将列名列表传递到X_
输入中时,它给了我一个数字作为输出:
example = training_loop(X_train[["lot_area", "gr_liv_area"]], y_train)
print(example)
72282.47
但是它永远无法与我的循环配合使用。运行它会得到以下输出:
CURRENTLY ON SIZE 2
fset: ['ms_subclass', 'lot_frontage']
loss: nan
fset: ['ms_subclass', 'lot_area']
loss: nan
fset: ['ms_subclass', 'overall_qual']
loss: nan
fset: ['ms_subclass', 'overall_cond']
loss: nan
itertools.combinations()
可能是问题吗?我确保使用list()
强制转换输出(以及输出中的每个单独元素),以确保能够使用它对熊猫对象进行索引,但仍然一无所获。但是以某种方式,当我手动传递列表时,它可以正常工作。可能是什么问题?
答案
您的代码正确。您是否已确认相同的功能集会产生不同的结果?
(即,您是否手动尝试了['ms_subclass', 'lot_frontage']
?]
以上是关于itertools.combinations()似乎正在干扰训练循环的主要内容,如果未能解决你的问题,请参考以下文章