itertools.combinations()似乎正在干扰训练循环

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了itertools.combinations()似乎正在干扰训练循环相关的知识,希望对你有一定的参考价值。

试图组合一个循环以对以下数据进行特征选择:

            pid  ms_subclass  lot_frontage  lot_area  overall_qual  
2126  907135180           20          60.0  0.992559             4   
192   903206120           75           NaN  0.965733             7   
2406  528181040          120          40.0  0.977838             7   
45    528175010          120          44.0  0.974905             7   
2477  531379030           60          70.0  0.972883             6   

      overall_cond  gr_liv_area  full_bath  half_bath  bedroom_abvgr  
2126             5     0.121764          1          0              3   
192              7     0.256273          1          1              3   
2406             5     0.196950          2          0              2   
45               5     0.207804          2          0              2   
2477             5     0.215220          2          1              3   

      kitchen_abvgr  fireplaces  garage_area  wood_deck_sf  open_porch_sf  
2126              1           0     0.000000      0.000000       0.000000   
192               1           1     0.039036      0.000000       0.000000   
2406              1           1     0.068241      0.019004       0.005039   
45                1           1     0.074063      0.029380       0.005356   
2477              1           0     0.080605      0.017574       0.019331   

      enclosed_porch  3ssn_porch  screen_porch  pool_area  d_a_(agr)  
2126        0.000000           0             0          0          0   
192         0.007435           0             0          0          0   
2406        0.000000           0             0          0          0   
45          0.000000           0             0          0          0   
2477        0.000000           0             0          0          0   

      d_c_(all)  d_fv  d_i_(all)  d_rh  d_rl  d_rm  d_ir1  d_ir2  d_ir3  
2126          0     0          0     0     1     0      0      0      0   
192           0     0          0     0     1     0      1      0      0   
2406          0     0          0     0     1     0      1      0      0   
45            0     0          0     0     1     0      1      0      0   
2477          0     0          0     0     1     0      1      0      0      

想法是获取大小为j的特征的所有可能组合,然后在每个特征上运行以下训练循环:

# data pre-processing
train = data.sample(frac=0.8, axis=0, random_state=1)
test = data.drop(train.index)
n_epochs = 15
batch_size = 50

X_train = transform_features(train) ; X_train_pandas = transform_features(train)
X_test = transform_features(test)

y_train = X_train.pop("sale_price") ; y_train_pandas = X_train_pandas.pop("sale_price")
y_test = X_test.pop("sale_price")

def hypothesis(X, W, b):
    return tf.tensordot(X, W, axes=1) + b

def mean_squared_error(y, y_pred):
    return tf.reduce_mean(tf.square(y_pred - y))

def d_mean_squared_error(y, y_pred):
    return tf.reshape(tf.reduce_mean(2 * (y_pred - y)), [1, 1])
def training_loop(X_, y_, n_epochs_=15, batch_size_=100, learning_rate_=0.001):
    n = X_.shape[0]
    n_features = X_.shape[1]
    W = tf.random.normal((n_features, 1))
    b = 0

    epochs = []
    training_losses = []

    # build model components
    X_train = tf.constant(X_.values, dtype=tf.float32)
    y_train = tf.constant(y_.values, dtype=tf.float32)

    # initialize TF dataset object
    d = tf.data.Dataset.from_tensor_slices((X_train, y_train))
    d.shuffle(len(X_train)).repeat(n_epochs).batch(batch_size)
    iterator = tf.compat.v1.data.make_one_shot_iterator(d)

    for i in range(n_epochs):
        epoch_losses = []
        for batch in range(n // batch_size):
            X_batch, y_batch = iterator.get_next()

            y_pred = hypothesis(X_batch, W, b)
            batch_loss = mean_squared_error(y_batch, y_pred)
            epoch_losses.append(batch_loss.numpy())

            dL_dH = d_mean_squared_error(y_batch, y_pred)
            dH_dW = X_batch
            dL_dW = tf.reduce_mean(dL_dH * dH_dW)
            dL_dB = tf.reduce_mean(dL_dH)

            W -= (learning_rate_ * dL_dW)
            b -= (learning_rate_ * dL_dB)

        loss = np.mean(epoch_losses)
        epochs.append(i)
        training_losses.append(loss)

    # give final error score as RMSE
    return np.sqrt(np.float32(loss))

这是我正在进行的功能选择过程的代码:

# number of features desired in model
k = 20
# get all usable features
allFeatures = [ f for f in X_train_pandas.columns if (f != "sale_price") & (f != "pid") ]
# record best mse score among each feature-set size j
j_losses = []

for j in range(2, k):
    print(f"CURRENTLY ON SIZE {j}")

    # generate list of all possible combinations of features
    possible_fsets = list(combinations(allFeatures, j))

    # record losses for each feature-set for j
    fset_losses = []

    # generate mse for each possible combination of features
    for fset in possible_fsets:
        fset = list(fset)
        fset_info = []
        fset_loss = training_loop(X_train[fset], y_train)
        print(f"fset: {fset}")
        print(f"loss: {fset_loss}")

        fset_info.append(fset)
        fset_info.append(fset_loss)

        fset_losses.append(fset_info)

    f_losses = pd.DataFrame.from_records(feature_losses, columns=["feature_set", "mse_loss"])
    f_losses.sort_values("mse_loss", inplace=True)
    print(f_losses.head())

    best_loss = []
    best_loss.append(f_losses["feature_set"].iloc[0])
    best_loss.append(f_losses["mse_loss"].iloc[0])

    j_losses.append(best_loss)

print(j_losses)

培训循环似乎可以单独很好地工作;当我手动将列名列表传递到X_输入中时,它给了我一个数字作为输出:

example = training_loop(X_train[["lot_area", "gr_liv_area"]], y_train)
print(example)
72282.47

但是它永远无法与我的循环配合使用。运行它会得到以下输出:

CURRENTLY ON SIZE 2
fset: ['ms_subclass', 'lot_frontage']
loss: nan
fset: ['ms_subclass', 'lot_area']
loss: nan
fset: ['ms_subclass', 'overall_qual']
loss: nan
fset: ['ms_subclass', 'overall_cond']
loss: nan

itertools.combinations()可能是问题吗?我确保使用list()强制转换输出(以及输出中的每个单独元素),以确保能够使用它对熊猫对象进行索引,但仍然一无所获。但是以某种方式,当我手动传递列表时,它可以正常工作。可能是什么问题?

答案

您的代码正确。您是否已确认相同的功能集会产生不同的结果?

(即,您是否手动尝试了['ms_subclass', 'lot_frontage']?]

以上是关于itertools.combinations()似乎正在干扰训练循环的主要内容,如果未能解决你的问题,请参考以下文章

Python:获取 itertools.combinations 以返回逐渐变大的组合

PYTHON ITERTOOLS

python值itertools模块

在 python 中使用组合对象

Python3 - 排列组合的迭代

Python Itertools 仅排列字母和数字