准确率 99%,分类不正确 - 三元网络

Posted

技术标签:

【中文标题】准确率 99%,分类不正确 - 三元网络【英文标题】:Accuracy 99%, classification incorrect - triplet network 【发布时间】:2018-05-31 19:58:21 【问题描述】:

我正在尝试按照facenet article 中的描述训练一个三元组网络。

我通过计算正距离(锚 - 正)小于负距离(锚 - 负)的三元组,然后除以批次中三元组的总数来计算验证集的准确度.

我得到了很好的结果:99% 的准确率。但是当我使用我的模型嵌入对图像进行分类时(我拍摄一张未知图像并比较它 - 使用欧几里德距离 - 与一些标记的图像) - 最多只有 20% 的结果是正确的。

我做错了什么?

你可以在下面找到我的详细实现。


三元组生成

在生成三元组之前,我使用 dlib(CASIA 和 LFW)对齐并裁剪了训练集和测试集,因此每张脸的主要元素(眼睛、鼻子、嘴唇)是定位几乎相同。

为了生成三元组,我随机选择一个包含 40 个或更多图像的 CASIA 文件夹,然后选择 40 个锚点,每个锚点都有对应的正图像(随机选择但与锚点不同)。然后我为每个锚-正对选择一个随机负数。


三元组损失

这是我的三元组损失函数:

def triplet_loss(d_pos, d_neg):

    print("d_pos "+str(d_pos))
    print("d_neg "+str(d_neg))

    margin = 0.2

    loss = tf.reduce_mean(tf.maximum(0., margin + d_pos - d_neg))


    return loss

这些是我的正距离(锚点和正例之间)和负距离(锚点和负例之间)。

**model1** = embeddings generated for the anchor image 
**model2** = embeddings generated for the positive image
**model3** = embeddings generated for the negative image

变量cost是我在每一步计算的损失。

    d_pos_triplet = tf.reduce_sum(tf.square(model1 - model2), 1)
    d_neg_triplet = tf.reduce_sum(tf.square(model1 - model3), 1)

    d_pos_triplet_acc = tf.sqrt(d_pos_triplet + 1e-10)
    d_neg_triplet_acc = tf.sqrt(d_neg_triplet + 1e-10)

    d_pos_triplet_test = tf.reduce_sum(tf.square(model1_test - model2_test), 1)
    d_neg_triplet_test = tf.reduce_sum(tf.square(model1_test - model3_test), 1)

    d_pos_triplet_acc_test = tf.sqrt(d_pos_triplet_test + 1e-10)
    d_neg_triplet_acc_test = tf.sqrt(d_neg_triplet_test + 1e-10)


    cost = triplet_loss(d_pos_triplet, d_neg_triplet)
    cost_test = triplet_loss(d_pos_triplet_test, d_neg_triplet_test)

然后我将嵌入一个一个并测试损失是否为正 - 因为 0 损失意味着网络不学习(如 facenet 文章中所述,我必须选择 半硬 三胞胎)

input1,input2, input3, anchor_folder_helper, anchor_photo_helper, positive_photo_helper = training.next_batch_casia(s,e) #generate complet random

            s = i * batch_size
            e = (i+1) *batch_size

        input1,input2, input3, anchor_folder_helper, anchor_photo_helper, positive_photo_helper = training.next_batch_casia(s,e) #generate complet random


        lly = 0; 

        '''counter which helps me generate the same number of triplets each batch'''

        while lly < len(input1):

            input_lly1 = input1[lly:lly+1]
            input_lly2 = input2[lly:lly+1]
            input_lly3 = input3[lly:lly+1]

            loss_value = sess.run([cost], feed_dict=x_anchor:input_lly1, x_positive:input_lly2, x_negative:input_lly3)



            while(loss_value[0]<=0):
                ''' While the generated triplet has loss 0 (which means dpos - dneg + margin < 0) I keep generating triplets. I stop when I manage to generate a semi-hard triplet. '''
                input_lly1,input_lly2, input_lly3, anchor_folder_helper, anchor_photo_helper, positive_photo_helper = training.cauta_hard_negative(anchor_folder_helper, anchor_photo_helper, positive_photo_helper)
                loss_value = sess.run([cost], feed_dict=x_anchor:input_lly1, x_positive:input_lly2, x_negative:input_lly3)

                if (loss_value[0] > 0):
                    _, loss_value, distance1_acc, distance2_acc, m1_acc, m2_acc, m3_acc  = sess.run([accum_ops, cost, d_pos_triplet_acc, d_neg_triplet_acc, model1, model2, model3], feed_dict=x_anchor:input_lly1, x_positive:input_lly2, x_negative:input_lly3)
                 tr_acc = compute_accuracy(distance1_acc, distance2_acc)

                 if math.isnan(tr_acc) and epoch != 0:
                    print('tr_acc %0.2f' % tr_acc)
                    pdb.set_trace()
                    avg_loss += loss_value
                    avg_acc +=tr_acc*100

                    contor_i = contor_i + 1

                    lly = lly + 1

这是我的模型 - 请注意,当我应用 L2 归一化时,我的准确率会显着下降(也许我做错了):

def siamese_convnet(x):

    w_conv1_1 = tf.get_variable(name='w_conv1_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 1, 64])
    w_conv1_2 = tf.get_variable(name='w_conv1_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 64, 64])

    w_conv2_1 = tf.get_variable(name='w_conv2_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 64, 128])
    w_conv2_2 = tf.get_variable(name='w_conv2_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 128, 128])

    w_conv3_1 = tf.get_variable(name='w_conv3_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 128, 256])
    w_conv3_2 = tf.get_variable(name='w_conv3_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 256])
    w_conv3_3 = tf.get_variable(name='w_conv3_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 256])

    w_conv4_1 = tf.get_variable(name='w_conv4_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 512])
    w_conv4_2 = tf.get_variable(name='w_conv4_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
    w_conv4_3 = tf.get_variable(name='w_conv4_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[1, 1, 512, 512])

    w_conv5_1 = tf.get_variable(name='w_conv5_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
    w_conv5_2 = tf.get_variable(name='w_conv5_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
    w_conv5_3 = tf.get_variable(name='w_conv5_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[1, 1, 512, 512])

    w_fc_1 = tf.get_variable(name='w_fc_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[5*5*512, 2048])
    w_fc_2 = tf.get_variable(name='w_fc_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[2048, 1024])

    w_out = tf.get_variable(name='w_out', initializer=tf.contrib.layers.xavier_initializer(), shape=[1024, 128])

    bias_conv1_1 = tf.get_variable(name='bias_conv1_1', initializer=tf.constant(0.01, shape=[64]))
    bias_conv1_2 = tf.get_variable(name='bias_conv1_2', initializer=tf.constant(0.01, shape=[64]))

    bias_conv2_1 = tf.get_variable(name='bias_conv2_1', initializer=tf.constant(0.01, shape=[128]))
    bias_conv2_2 = tf.get_variable(name='bias_conv2_2', initializer=tf.constant(0.01, shape=[128]))

    bias_conv3_1 = tf.get_variable(name='bias_conv3_1', initializer=tf.constant(0.01, shape=[256]))
    bias_conv3_2 = tf.get_variable(name='bias_conv3_2', initializer=tf.constant(0.01, shape=[256]))
    bias_conv3_3 = tf.get_variable(name='bias_conv3_3', initializer=tf.constant(0.01, shape=[256]))

    bias_conv4_1 = tf.get_variable(name='bias_conv4_1', initializer=tf.constant(0.01, shape=[512]))
    bias_conv4_2 = tf.get_variable(name='bias_conv4_2', initializer=tf.constant(0.01, shape=[512]))
    bias_conv4_3 = tf.get_variable(name='bias_conv4_3', initializer=tf.constant(0.01, shape=[512]))

    bias_conv5_1 = tf.get_variable(name='bias_conv5_1', initializer=tf.constant(0.01, shape=[512]))
    bias_conv5_2 = tf.get_variable(name='bias_conv5_2', initializer=tf.constant(0.01, shape=[512]))
    bias_conv5_3 = tf.get_variable(name='bias_conv5_3', initializer=tf.constant(0.01, shape=[512]))

    bias_fc_1 = tf.get_variable(name='bias_fc_1', initializer=tf.constant(0.01, shape=[2048]))
    bias_fc_2 = tf.get_variable(name='bias_fc_2', initializer=tf.constant(0.01, shape=[1024]))

    out = tf.get_variable(name='out', initializer=tf.constant(0.01, shape=[128]))

    x = tf.reshape(x , [-1, 160, 160, 1]);

    conv1_1 = tf.nn.relu(conv2d(x, w_conv1_1) + bias_conv1_1);
    conv1_2= tf.nn.relu(conv2d(conv1_1, w_conv1_2) + bias_conv1_2);

    max_pool1 = max_pool(conv1_2);

    conv2_1 = tf.nn.relu( conv2d(max_pool1, w_conv2_1) + bias_conv2_1 );
    conv2_2 = tf.nn.relu( conv2d(conv2_1, w_conv2_2) + bias_conv2_2 );

    max_pool2 = max_pool(conv2_2)

    conv3_1 = tf.nn.relu( conv2d(max_pool2, w_conv3_1) + bias_conv3_1 );
    conv3_2 = tf.nn.relu( conv2d(conv3_1, w_conv3_2) + bias_conv3_2 );
    conv3_3 = tf.nn.relu( conv2d(conv3_2, w_conv3_3) + bias_conv3_3 );

    max_pool3 = max_pool(conv3_3)

    conv4_1 = tf.nn.relu( conv2d(max_pool3, w_conv4_1) + bias_conv4_1 );
    conv4_2 = tf.nn.relu( conv2d(conv4_1, w_conv4_2) + bias_conv4_2 );
    conv4_3 = tf.nn.relu( conv2d(conv4_2, w_conv4_3) + bias_conv4_3 );

    max_pool4 = max_pool(conv4_3)

    conv5_1 = tf.nn.relu( conv2d(max_pool4, w_conv5_1) + bias_conv5_1 );
    conv5_2 = tf.nn.relu( conv2d(conv5_1, w_conv5_2) + bias_conv5_2 );
    conv5_3 = tf.nn.relu( conv2d(conv5_2, w_conv5_3) + bias_conv5_3 );

    max_pool5 = max_pool(conv5_3)

    fc_helper = tf.reshape(max_pool5, [-1, 5*5*512]);
    fc_1 = tf.nn.relu( tf.matmul(fc_helper, w_fc_1) + bias_fc_1 );

    fc_2 = tf.nn.relu( tf.matmul(fc_1, w_fc_2) + bias_fc_2 );

    output = tf.matmul(fc_2, w_out) + out
    #output = tf.nn.l2_normalize(output, 0) THIS IS COMMENTED


    return output

我的模型以独立于框架的方式:

conv 3x3 (1, 64)
conv 3x3 (64,64)
max_pooling
conv 3x3 (64, 128)
conv 3x3 (128, 128)
max_pooling
conv 3x3 (128, 256)
conv 3x3 (256, 256)
conv 3x3 (256, 256)
max_pooling
conv 3x3 (256, 512)
conv 3x3 (512, 512)
conv 1x1 (512, 512)
max_pooling
conv 3x3 (256, 512)
conv 3x3 (512, 512)
conv 1x1 (512, 512)
max_pooling
fully_connected(128)
fully_connected(128)
output(128)

【问题讨论】:

【参考方案1】:

你的 L2 规范化应该是典型的,但它是基于特征的。

【讨论】:

感谢您的回答!你能提供更多细节吗?你所说的模范是什么意思?谢谢! @HelloLili 你知道模范明智的意思吗? @AbhijitBalaji 不,但我发现我做错了什么: output = tf.nn.l2_normalize(output, 0) 应该是 output = tf.nn.l2_normalize(output, axis=1) @HelloLili 我想这就是他的意思。当您将 L2 标准化为轴 = 0(批处理轴)时,您正在按批次对其进行标准化,当它沿轴 = 1 标准化时(每个训练示例的嵌入(因为您在 l2 范数之前有一个密集层))标准化是示例明智的。有意义吗? @AbhijitBalaji 是的,这就是我正在寻找的解释!谢谢!

以上是关于准确率 99%,分类不正确 - 三元网络的主要内容,如果未能解决你的问题,请参考以下文章

尽管准确率超过 99%,但随机森林模型产生了不正确的预测

模型评估PR|ROC|AUC

牢记分类指标:准确率、精确率、召回率、F1 score以及ROC

基于汉字字频特征实现99.99%准确率的新闻文本分类器

二层神经网络准确率一直只有88%左右

在 keras 中实现三元组损失的准确性