Transformer从输入到输出进行编写代码-附代码和注解

Posted 2022-01-18 Coding With you.....

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Transformer从输入到输出进行编写代码-附代码和注解相关的知识，希望对你有一定的参考价值。

首先在搭建框架的时候，要明白每一步需要什么，数据之间怎么传送

首先整体框架是端到端的：编码层-解码层-输出层【传入数据为源数据和目标数据，计算时需要mask的输入】MOdel(Encoder(),Decoder(),src_embed(),tgt_embed,generate(dim,tgt_vocab))

编码层：Encoder(EncoderLayer(Attention(AttentionLayer(heads,dim),N),Fnn()),N)

解码层:Decoder(DEcodeLayer(),N)

输出层：需要嵌入的维度和目标词表的大小

class MyTransformer(nn.Module):
   def __init__(self):
       super().__init__()
       self.dim=1024
      # self.wq=nn.Linear(self.dim,self.dim,bias=False)
      #self.wk = nn.Linear(self.dim, self.dim, bias=False)
      # self.wv = nn.Linear(self.dim, self.dim, bias=False)
       self.heads=12# 进阶
       self.every_dim=self.dim //self.heads# 进阶
       self.wq = nn.Linear(self.dim, self.heads*self.every_dim, bias=False)# 进阶
       self.wk = nn.Linear(self.dim, self.heads*self.every_dim, bias=False)# 进阶
       self.wv = nn.Linear(self.dim, self.heads*self.every_dim, bias=False)# 进阶
       self.w = nn.Linear(self.dim, self.heads * self.every_dim, bias=True)  # 多头进阶：保证在计算后可以再映射回线性层

       self.lm=nn.LayerNorm(self.dim)
       self.ffn1=nn.Linear(self.dim, self.dim, bias=True)
       self.ffn2 = nn.Linear(self.dim, self.dim, bias=True)
       self.lm_fnn = nn.LayerNorm(self.dim) #这个归一化和注意力部分的归一化的参数不同
       self.drop1=0.1# 进阶
       self.drop2=0.4# 进阶
       self.att_drop=nn.Dropout(self.drop1)# 进阶
       self.state_drop=nn.Dropout(self.drop2)# 进阶

       self.projection=nn.Linear(self.dim,self.tgt_vocab_size,bias=False)
    def cal_mask_score(self, attention_mask):#输入句子数*每句的词 batch*n  输出 batch*heads*n*n
       attention_mask_score =torch.zeros(attention_mask.size(0),self.heads,attention_mask.size(1),attention_mask.size(1))
       attention_mask_score=attention_mask_score+attention_mask[:,None,None,:]
       attention_mask_score=(1-attention_mask_score)*-10000
       return attention_mask_score
   def Attention(self,x,attention_mask):
       #进阶：attention_mask为1时不会mask
       #q k v=======batch*m*emd_n
       #q=self.wq(x)
       #k=self.wk(x)
       #v=self.wv(x)
       # 进阶  q k v=======batch*m*heads*everydim
       new_size=x[:,-1]+(self.heads,self.every_dim) #去掉最后一列再加上两列
       q = self.wq(x).view(*new_size).permute(0,2,1,3)#进行维度的转换并且重排：batch*heads*m*everydim
       k = self.wk(x).view(*new_size).permute(0,2,1,3)
       v = self.wv(x).view(*new_size).permute(0,2,1,3)


       #attention_score=torch.mm(q,k.transpose(0,1))/math.sqrt(self.dim)
       #attention_score = torch.bmm(q, k.transpose(1, 2)) / math.sqrt(self.dim) #进阶
       attention_score = torch.matmul(q, k.transpose(2,3)) / math.sqrt(self.dim)  # 多头进阶

       #mask进阶
       attention_score=attention_score+self.cal_mask_score(attention_score)


      # attention_score=nn.Softmax(dim=1)(attention_score)
       #attention_score = nn.Softmax(dim=2)(attention_score) #进阶
       attention_score = nn.Softmax(dim=3)(attention_score)  # 多头进阶

      # output=torch.mm(attention_score,v)
       attention_score=self.att_drop(attention_score)# 进阶
       #output = torch.bmm(attention_score, v)  # 进阶
       output = torch.matmul(attention_score, v)  # 多头进阶

       #output=self.state_drop(output)# 进阶
       output=self.w(output.permute(0,2,1,3))  # 多头进阶  batch*heads*m*everydim---------->batch*m*heads*everydim
       output = self.state_drop(output) #batch*m*embedn

       output=self.lm(x+output) #残差网络
       return output
   def FFN(self,x):
       hidden=self.ffn1(x)
       output=self.ffn2(hidden)
       output=self.state_drop(output)# 进阶
       output=self.lm_fnn(x+output)
       return output

   def forward(self,x):
       x=nn.Embedding(x.batchsize,self.dim)#编码
       x=self.Attention(x)#注意力
       x=self.FFN(x) #前馈
       x=self.Encoder(x)
       output=self.projection(x) #解码：这部分还有一个Encoder框架  参数是解码端的输出  维度：batch*编码层vocabsize*解码端vocabsize
       return output.view(-1,output.size(-1))

以上是关于Transformer从输入到输出进行编写代码-附代码和注解的主要内容，如果未能解决你的问题，请参考以下文章