深度学习100例 | 第44天:密码破译

Posted K同学啊

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了深度学习100例 | 第44天:密码破译相关的知识,希望对你有一定的参考价值。


大家好,我是K同学啊

今天大家一起做一个深度学习在密码破译方面的应用,本文仅供参考学习,请勿用作其他用途!本文重点如下:

  • 读取并合并Excel文件中的多个子表
  • 构建一个单输入多输出模型
  • 单输入多输出与单输入单输出DL程序在构建流程上有何异同

我们程序的目标是 实现由 公开哈希 预测 秘密哈希secret_salt

🚀 我的环境:

  • 语言环境:Python3.6.5
  • 编译器jupyter notebook
  • 深度学习环境:TensorFlow2.4.1
  • 本文数据:公众号(K同学啊)内回复 DL+44可以获取数据
  • 项目代码已全部放在文中,按顺序copy即可

如果你是一名深度学习小白可以先看看我这个专门为你写的专栏:《小白入门深度学习》

我们的代码流程图如下所示:

文章目录

一、前期准备工作

1. 导入数据

这里注意学习一下如何读取Excel文件,并合并Excel文件中的多个子表。

import tensorflow as tf
import pandas     as pd
import numpy      as np

gpus = tf.config.list_physical_devices("GPU")
if gpus:
    tf.config.experimental.set_memory_growth(gpus[0], True)  #设置GPU显存用量按需使用
    tf.config.set_visible_devices([gpus[0]],"GPU")
print(gpus)

dataframe = pd.read_excel("哈希数据.xls",sheet_name=[0,1,2,3],names=["uid","公开哈希","秘密哈希","secret_salt"])
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
# 现将表构成list,然后在作为concat的输入
frames = [dataframe[0], dataframe[1], dataframe[2],dataframe[3]]
result = pd.concat(frames)
#记得重置索引(index)!不然后面容易出问题,我曾经因为这个找了一个上午的bug。
df = result.reset_index(drop=True)
df

2. 检查数据

查看公开哈希、秘密哈希、secret_salt三者长度是多少,是否会发生变化。

len(df.iloc[1,1]),len(df.iloc[1,2]),len(df.iloc[1,3])
(64, 128, 64)
len(df.iloc[2,1]),len(df.iloc[2,2]),len(df.iloc[2,3])
(64, 128, 64)
len(df.iloc[21,1]),len(df.iloc[21,2]),len(df.iloc[21,3])
(64, 128, 64)

二、构建训练集与测试集

X_  = df.iloc[:,1]
y_1 = df.iloc[:,2]
y_2 = df.iloc[:,3]

将标签数字化

注意到公开哈希、秘密哈希、secret_salt这些都是有10个阿拉伯数字与26个英文字母组成的且每一个字段长度固定,故而这里将每个字段转化成one-hot编码。

number   = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
char_set       = number + alphabet
char_set_len   = len(char_set)
y1_name_len = len(y_1[0])
y2_name_len = len(y_2[0])

# 将字符串数字化
def text2vec(text,label_name_len):
    vector = np.zeros([label_name_len, char_set_len])
    for i, c in enumerate(text):
        idx = char_set.index(c)
        vector[i][idx] = 1.0
    return vector

y1_list = [text2vec(i,len(y_1[0])) for i in y_1]
y2_list = [text2vec(i,len(y_2[0])) for i in y_2]
X_list  = [text2vec(i,len(X_[0]))  for i in X_]
X_train  = np.array(X_list[:50000])
y1_train = np.array(y1_list[:50000])
y2_train = np.array(y2_list[:50000])

X_train = X_train.reshape(X_train.shape[0],X_train.shape[1],X_train.shape[2],1)
X_train.shape,y1_train.shape,y2_train.shape
((50000, 64, 36, 1), (50000, 128, 36), (50000, 64, 36))

三、构建模型

这次我们构建的模型与以往有很大的不同,由于我们的目标是由 公开哈希 预测 秘密哈希secret_salt,所以这次构建的是一个单输入多输出模型,注意观察模型的构建方式。

更多的理论基础请看考 新手入门深度学习 | 4-1:构建模型的两种方法 一文中Model模型部分。

from tensorflow.keras import datasets, layers, models
from keras.layers import Input, Dense
from keras.models import Model

deep_inputs =  Input(shape=(X_train.shape[1], X_train.shape[2],1))
x = layers.Conv2D(32,(3,3),activation='relu')(deep_inputs)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

x = layers.Conv2D(128, (3,3), activation='relu')(x)
x = layers.MaxPooling2D((2,2), strides=(2,2))(x)
    
x = layers.Flatten()(x)
x = layers.Dense(2000, activation='relu')(x)

# 模型输出1
out1 = layers.Dense(y1_name_len * char_set_len)(x)
out1 = layers.Reshape([y1_name_len, char_set_len])(out1)
out1 = layers.Softmax(name="out1")(out1)
# 模型输出2            
out2 = layers.Dense(y2_name_len * char_set_len)(x)
out2 = layers.Reshape([y2_name_len, char_set_len])(out2)
out2 = layers.Softmax(name="out2")(out2)

model = Model(inputs=deep_inputs, outputs=[out1,out2])
# 打印网络结构
model.summary()
Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_2 (InputLayer)            [(None, 64, 36, 1)]  0                                            
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 62, 34, 32)   320         input_2[0][0]                    
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 31, 17, 32)   0           conv2d_3[0][0]                   
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 29, 15, 64)   18496       max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 14, 7, 64)    0           conv2d_4[0][0]                   
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 12, 5, 128)   73856       max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 6, 2, 128)    0           conv2d_5[0][0]                   
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 1536)         0           max_pooling2d_5[0][0]            
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 2000)         3074000     flatten_1[0][0]                  
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 4608)         9220608     dense_3[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 2304)         4610304     dense_3[0][0]                    
__________________________________________________________________________________________________
reshape_2 (Reshape)             (None, 128, 36)      0           dense_4[0][0]                    
__________________________________________________________________________________________________
reshape_3 (Reshape)             (None, 64, 36)       0           dense_5[0][0]                    
__________________________________________________________________________________________________
out1 (Softmax)                  (None, 128, 36)      0           reshape_2[0][0]                  
__________________________________________________________________________________________________
out2 (Softmax)                  (None, 64, 36)       0           reshape_3[0][0]                  
==================================================================================================
Total params: 16,997,584
Trainable params: 16,997,584
Non-trainable params: 0
__________________________________________________________________________________________________

四、编译模型

关于损失函数的选择可以参考文章:新手入门深度学习 | 3-4:损失函数Loss

optimizer = tf.keras.optimizers.Adam(1e-4)

model.compile(optimizer=optimizer,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

五、训练模型

epochs = 200

history = model.fit(
    X_train,
    (y1_train,y2_train),
    validation_data=(X_train,(y1_train,y2_train)),
    epochs=epochs
)
Epoch 1/200
1563/1563 [==============================] - 15s 8ms/step - loss: 5.5879 - softmax_loss: 2.7939 - softmax_1_loss: 2.7941 - softmax_accuracy: 0.0624 - softmax_1_accuracy: 0.0626 - val_loss: 5.5553 - val_softmax_loss: 2.7776 - val_softmax_1_loss: 2.7777 - val_softmax_accuracy: 0.0632 - val_softmax_1_accuracy: 0.0630
Epoch 2/200
1563/1563 [==============================] - 9s 6ms/step - loss: 5.5535 - softmax_loss: 2.7767 - softmax_1_loss: 2.7768 - softmax_accuracy: 0.0626 - softmax_1_accuracy: 0.0627 - val_loss: 5.5507 - val_softmax_loss: 2.7755 - val_softmax_1_loss: 2.7752 - val_softmax_accuracy: 0.0632 - val_softmax_1_accuracy: 0.0635
Epoch 3/200
1563/1563 [==============================] - 9s 6ms/step - loss: 5.5498 - softmax_loss: 2.7750 - softmax_1_loss: 2.7748 - softmax_accuracy: 0.0626 - softmax_1_accuracy: 0.0633 - val_loss: 5.5476 - val_softmax_loss: 2.7739 - val_softmax_1_loss: 2.7738 - val_softmax_accuracy: 0.0638 - val_softmax_1_accuracy: 0.0643
......
Epoch 199/200
1563/1563 [==============================] - 9s 6ms/step - loss: 3.6113 - softmax_loss: 2.3187 - softmax_1_loss: 1.2925 - softmax_accuracy: 0.2610 - softmax_1_accuracy: 0.5860 - val_loss: 3.4486 - val_softmax_loss: 2.2581 - val_softmax_1_loss: 1.1905 - val_softmax_accuracy: 0.2817 - val_softmax_1_accuracy: 0.6332
Epoch 200/200
1563/1563 [==============================] - 9s 6ms/step - loss: 3.6094 - softmax_loss: 2.3188 - softmax_1_loss: 1.2906 - softmax_accuracy: 0.2608 - softmax_1_accuracy: 0.5865 - val_loss: 3.4469 - val_softmax_loss: 2.2583 - val_softmax_1_loss: 1.1886 - val_softmax_accuracy: 0.2815 - val_softmax_1_accuracy: 0.6342

本文的重点在于走通密码破译这个流程,如果你对这方面感兴趣可以私下自己研究哈,这里就不方便提供准确率过高的算法模型了。

🥤 点击我 获取一对一辅导 🥤

以上是关于深度学习100例 | 第44天:密码破译的主要内容,如果未能解决你的问题,请参考以下文章

深度学习100例 | 第29天-ResNet50模型:船型识别

深度学习100例 | 第29天-ResNet50模型:船型识别

深度学习100例 - 卷积神经网络(Inception V3)识别手语 | 第13天

深度学习100例-卷积神经网络(CNN)猴痘病识别 | 第45天

✨你还在为验证码而苦恼吗?我直接忽视它的存在(亲测,可运行!) - 深度学习100例 | 第12天

深度学习100例-卷积神经网络(CNN)猴痘病识别 | 第45天