Keras 报告错误的准确性

Posted

技术标签:

【中文标题】Keras 报告错误的准确性【英文标题】:Keras report wrong accuracy 【发布时间】:2018-11-07 21:35:44 【问题描述】:

我正在 Keras 中训练生成对抗网络 (GAN)。

我的日志报告说两个网络(鉴别器和组合模型)都达到了 100% 的准确率。这表明有问题。

我尝试运行推理,发现判别器确实 100% 准确,但生成器只产生噪声,根本没有欺骗判别器。

我的问题:为什么 Keras 将我的组合模型的准确率报告为 100%?

代码:

generator = create_generator(input_shape=(374,))
in_vector = Input(shape=(374,))
fake_images = generator(in_vector)

discriminator = create_discriminator()
disc_optimizer = keras.optimizers.SGD(lr=1e-4)
discriminator.compile(optimizer=disc_optimizer, loss='binary_crossentropy', metrics=['accuracy'])

discriminator.trainable = False
for l in discriminator.layers:
    l.trainable = False

gan_output = discriminator(fake_images)
gan = Model(in_vector, gan_output)
gan_optimizer = keras.optimizers.RMSprop(lr=1e-5)
gan.compile(optimizer=gan_optimizer, loss='binary_crossentropy', metrics=['accuracy'])

start_time = datetime.datetime.now()
tensorboard = TensorBoard(log_dir=f'data/logs/gawwn/start_time')
tensorboard.set_model(gan)

d_train_logs = ['train_discriminator_loss',
                'train_discriminator_accuracy']
g_train_logs = ['train_generator_loss',
                'train_generator_accuracy']
val_logs = ['val_discriminator_loss',
            'val_discriminator_accuracy',
            'val_generator_loss',
            'val_generator_accuracy']

d_train_step, g_train_step, val_step = 0, 0, 0

valid = np.ones((batch_size, 1))
fake = np.zeros((batch_size, 1))

noise_sigma = 0.00
noise_decay = 0.95

for epoch in range(1, 1 + epochs):
    d_loss = [1]
    while d_loss[0] > d_loss_thres:
        for i, (x_vectors, x_images, y) in enumerate(train_loader.load_batch(batch_size)):
            # ---------------------
            #  Train Discriminator
            # ---------------------

            # Generate a batch of new images
            gen_imgs = generator.predict(x_vectors)

            # Train the discriminator
            data = np.concatenate([y, gen_imgs], axis=0)
            labels = np.concatenate([valid[:len(y)], fake[:len(y)]])
            train_batch = list(zip(data, labels))
            np.random.shuffle(train_batch)
            data, labels = zip(*train_batch)
            data, labels = np.array(data), np.array(labels)
            d_loss = discriminator.train_on_batch(data, labels)

#             d_loss_real = discriminator.train_on_batch(y, valid[:len(y)])
#             d_loss_fake = discriminator.train_on_batch(gen_imgs, fake[:len(y)])
#             d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
            write_log(tensorboard, d_train_logs, d_loss, d_train_step)
            d_train_step += 1
            time_elaped = datetime.datetime.now() - start_time
            print(f'D step d_train_step: loss=d_loss[0]; acc=d_loss[1]; time=time_elaped')

    g_loss = [1]
    while g_loss[0] > g_loss_thres:
        for i, (x_vectors, x_images, y) in enumerate(train_loader.load_batch(batch_size)):
            # ---------------------
            #  Train Generator
            # ---------------------

            # Train the generator (to have the discriminator label samples as valid)
            g_loss = gan.train_on_batch(x_vectors, valid[:len(y)])

            # Plot the progress
            write_log(tensorboard, g_train_logs, g_loss, g_train_step)
            g_train_step += 1

            time_elaped = datetime.datetime.now() - start_time
            print(f'G step g_train_step: loss=g_loss[0]; acc=g_loss[1]; time=time_elaped')

    # If at save interval => save generated image samples
    if epoch % sample_interval == 0:
        d_losses = []
        g_losses = []
        for x_vectors, x_images, y in val_loader.load_batch(batch_size):
            gen_imgs = generator.predict(x_vectors)
            d_loss_real = discriminator.test_on_batch(y, valid[:len(y)])
            d_loss_fake = discriminator.test_on_batch(gen_imgs, fake[:len(y)])
            d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
            d_losses.append(d_loss)

            g_loss = gan.test_on_batch(x_vectors, valid[:len(y)])
            g_losses.append(g_loss)

        d_loss = np.average(d_losses, axis=0)
        g_loss = np.average(g_losses, axis=0)
        write_log(tensorboard, val_logs, [d_loss[0], d_loss[1], g_loss[0], g_loss[1]], val_step)
        val_step += 1
        sample_images(val_loader, generator, epoch)
        save_model(generator, epoch, 'generator')
        save_model(discriminator, epoch, 'discriminator')

最后几个步骤的结果:

D step 349: loss=0.09932675957679749; acc=1.0; time=0:05:58.468997
D step 350: loss=0.10563915222883224; acc=0.9900000095367432; time=0:05:59.088657
D step 351: loss=0.09658461064100266; acc=1.0; time=0:05:59.533442
G step 214: loss=0.167491614818573; acc=0.9800000190734863; time=0:06:00.196747
G step 215: loss=0.13409791886806488; acc=1.0; time=0:06:00.891946
G step 216: loss=0.1523411124944687; acc=0.9722222089767456; time=0:06:01.402974
D step 352: loss=0.10553492605686188; acc=0.9900000095367432; time=0:06:02.015083
D step 353: loss=0.10318870842456818; acc=0.9900000095367432; time=0:06:02.654599
D step 354: loss=0.07871382683515549; acc=1.0; time=0:06:03.131933
G step 217: loss=0.1493617743253708; acc=0.9800000190734863; time=0:06:03.827815
G step 218: loss=0.12147567421197891; acc=0.9599999785423279; time=0:06:04.537494
G step 219: loss=0.17327196896076202; acc=1.0; time=0:06:05.099841
D step 355: loss=0.10441411286592484; acc=0.9900000095367432; time=0:06:05.768096
D step 356: loss=0.09612423181533813; acc=1.0; time=0:06:06.451947
D step 357: loss=0.1072489321231842; acc=0.9861111044883728; time=0:06:06.937882

推理:

>>> np.reshape(discriminator.predict(ground_truth), (5, 10))

array([[0.5296475 , 0.52787906, 0.5270807 , 0.5260455 , 0.528732  ,
        0.52820367, 0.53157693, 0.52730876, 0.5244186 , 0.52673554],
       [0.5229454 , 0.5239704 , 0.53051734, 0.52862865, 0.52718925,
        0.52680767, 0.52621156, 0.5308223 , 0.52489233, 0.5297055 ],
       [0.53033316, 0.5260847 , 0.5300899 , 0.52788675, 0.529595  ,
        0.52183014, 0.5321261 , 0.5251559 , 0.52876014, 0.52384466],
       [0.528658  , 0.52737784, 0.53003156, 0.52685475, 0.53047454,
        0.52759105, 0.52710444, 0.52546424, 0.52709824, 0.52520245],
       [0.5283209 , 0.52810913, 0.52451426, 0.5196351 , 0.5299184 ,
        0.5274567 , 0.52686375, 0.5269972 , 0.5248108 , 0.5263274 ]],
      dtype=float32)

>>> np.reshape(gan.predict(input_vector), (5, 10))

array([[0.4719111 , 0.47217596, 0.47209665, 0.47233126, 0.4741753 ,
        0.4712048 , 0.4721919 , 0.47193947, 0.47010162, 0.47092766],
       [0.47291884, 0.47334394, 0.4714141 , 0.46976995, 0.47092718,
        0.47233835, 0.47164065, 0.47276756, 0.47107005, 0.47187868],
       [0.47153524, 0.47157907, 0.4706026 , 0.47128928, 0.47320494,
        0.47089615, 0.47108623, 0.47432283, 0.47186196, 0.47404772],
       [0.47164053, 0.47348404, 0.4701542 , 0.4741918 , 0.4702833 ,
        0.47303212, 0.4726331 , 0.47118646, 0.47191456, 0.47318774],
       [0.47043982, 0.47027725, 0.47308347, 0.47376725, 0.4733549 ,
        0.47157207, 0.47205287, 0.47177386, 0.47119975, 0.4707804 ]],
      dtype=float32)

【问题讨论】:

我可以确认这种行为,并且只有在我使用train_on_batch() 时才会发生。尝试使用我自己的准确性实现,但仍然报告的值与所做的实际预测不匹配。 【参考方案1】:

注意gan = Model(in_vector, gan_output),因此您的模型定义为从输入向量到判别器输出的所有层,包括中间的生成器。所以当你打电话时

gan.compile(optimizer=gan_optimizer, loss='binary_crossentropy', metrics=['accuracy']),它会自动使用鉴别器的输出来确定准确性。因此,为了获得生成器的准确性,您可以使用回调并手动计算“准确性”,但这可能是为您的生成器定义的(加上生成器在您考虑时没有典型的准确性指标,您打算怎么做?比较一下?)。 另外,如果您的生成器产生随机噪声,这并不意味着准确度应该为 0,并且由于您只使用鉴别器的准确度并且它成功识别输出不属于基础分布,因此准确度保持为 100百分比(这很容易,因为生成器的输出是随机噪声)。简而言之,判别器的高精度并不意味着生成器成功地欺骗了判别器。事实上,当判别器的准确率接近 50% 时,这意味着生成器确实对输入数据进行了很好的建模,而判别器无法区分两者,并且正在进行随机猜测。因此,您看到的是预期行为

【讨论】:

我认为你误解了我的情况。在训练期间,我强制 gan(组合模型)的标签为 1(真实):g_loss = gan.train_on_batch(x_vectors, valid[:len(y)]) 看看 gan 的推理输出,我看到所有预测都小于 0.5,这意味着模型预测所有生成的 0图片。总结:正确的标签应该是1,但是所有的预测都是0。那为什么它报告的准确率是100%呢?

以上是关于Keras 报告错误的准确性的主要内容,如果未能解决你的问题,请参考以下文章

在 keras 模型指标中使用简单的“准确性”进行多类分类在技术上是错误的吗?我们应该使用 CategoricalAccuracy() 吗?

使用 Keras 构建了一个模型,该模型报告了良好的准确性,但随后无法进行预测

如何在 Keras 中绘制 MLP 模型的训练损失和准确度曲线?

Keras model.fit log 和 Sklearn.metrics.confusion_matrix 报告的验证准确度指标不匹配

Keras + tensorflow + P100:cudaErrorNotSupported = 71 错误

使用 keras 图启动 Tensorboard(用于可视化准确性、损失和预测结果)