transformer中的 train.py的理解
Posted shaylin
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了transformer中的 train.py的理解相关的知识,希望对你有一定的参考价值。
1. 定义矩形scheme ret 得到一个bach_sizes数组
{‘min_length‘: 8, ‘window_size‘: 720,
‘shuffle_queue_size‘: 270,
‘boundaries‘: [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 33, 36, 39, 42, 46, 50, 55, 60, 66, 72, 79, 86, 94, 103, 113, 124, 136, 149, 163, 179, 196, 215, 236],
‘max_length‘: 256,
‘batch_sizes‘: [240, 180, 180, 180, 144, 144, 144, 120, 120, 120, 90, 90, 90, 90, 80, 72, 72, 60, 60, 48, 48, 48, 40, 40, 36, 30, 30, 24, 24, 20, 20, 18, 18, 16, 15, 12, 12, 10, 10, 9, 8, 8]}
2.input_pipline 读取文件 10个文件 decode_record
组合成字典形式的数据集 dataset {"src_id": "target_id":}
(1)过滤长度:#根据源端和目标端句子长度最大的过滤
length = _example_length(example)
return tf.logical_and(length >= min_length, length <= max_length)
dataset = dataset.filter(functools.partial(example_valid_size, min_length = batching_scheme["min_length"], max_length = batching_scheme["max_length"]))
filter会作用于每一个dataset
(2)根据长度选择篮子编号:传入dataset {"src_id": "target_id":} 以及bundaries{} 遍历句子的长度,进行比较
conditions_c = tf.logical_and(tf.less_equal(buckets_min, seq_length), tf.less(seq_length, buckets_max))
返回 budaries所在的位置
根据上次返回的id,找到篮子的位置,并找到窗口的大小。其窗口的定义,用英文的解释比较好理解:我所理解的就是,比如一个能放mg的篮子
window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size_func.
tf.contrib.data.group_by_window(
key_func,
reduce_func,
window_size=None,
window_size_func=None
)
Defined in tensorflow/contrib/data/python/ops/grouping.py.
A transformation that groups windows of elements by key and reduces them.
This transformation maps each consecutive element in a dataset to a key using key_func and groups the elements by key. It then applies reduce_func to at most window_size_func(key) elements matching the same key. All except the final window for each key will contain window_size_func(key) elements; the final window may be smaller.
You may provide either a constant window_size or a window size determined by the key through window_size_func.
Args:
key_func: A function mapping a nested structure of tensors (having shapes and types defined by self.output_shapes and self.output_types) to a scalar tf.int64 tensor.
reduce_func: A function mapping a key and a dataset of up to window_size consecutive elements matching that key to another dataset.
window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size_func.
window_size_func: A function mapping a key to a tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size.
Returns:
A Dataset transformation function, which can be passed to tf.data.Dataset.apply.
Raises:
ValueError: if neither or both of {window_size, window_size_func} are passed.
(3)进行 pad grouped_dataset.padded_batch(batch_size, padded_shapes) ----group_dataset是什么 batch_size 为句子的个数 padded_shapes 要pad的维度
整合 ,将id序列编程矩阵 dataset.apply(tf.contrib.data.group_by_window(example_to_bucket_id, batching_fn, None, )
二:
一维卷积:https://blog.csdn.net/appleyuchi/article/details/78597054
tf.reshape:https://blog.csdn.net/lxg0807/article/details/53021859
list和tublehttps://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014316724772904521142196b74a3f8abf93d8e97c6ee6000
expend_dims :https://blog.csdn.net/qq_31780525/article/details/72280284
tf.concat 以及tf.split: https://blog.csdn.net/momaojia/article/details/77603322 https://blog.csdn.net/UESTC_C2_403/article/details/73350457
feedforward:一维卷积网络设计,然后两层卷积之间加了relu非线性操作。之后是residual操作加上inputs残差,然后是normalize--->不直接用layers.dense直接进行全连接
label_smothing:
(1)normalization: normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
outputs = gamma * normalized + beta 获取均值和方差:
‘‘‘Applies layer normalization.
Args:
inputs: A tensor with 2 or more dimensions, where the first dimension has
`batch_size`.
epsilon: A floating number. A very small number for preventing ZeroDivision Error.
scope: Optional scope for `variable_scope`.
reuse: Boolean, whether to reuse the weights of a previous layer
by the same name.
Returns:
A tensor with the same shape and data dtype as `inputs`.
‘‘‘
beta,和gamma没有做什么?
(2)embedding: 其用到了一个tensorflow中一个embedding 方法使输入的张量分布的更均匀,词与词之间存在着某种关系
并且比输入的多一个维度,最后一维为神经元的个数
scale参数对outputs根据num_units的大小进行了scale,当scale为True时执行scale,默认为True???????
‘‘‘Embeds a given tensor.
Args:
inputs: A `Tensor` with type `int32` or `int64` containing the ids
to be looked up in `lookup table`.
vocab_size: An int. Vocabulary size.
num_units: An int. Number of embedding hidden units.
zero_pad: A boolean. If True, all the values of the fist row (id 0)
should be constant zeros.
scale: A boolean. If True. the outputs is multiplied by sqrt num_units.
scope: Optional scope for `variable_scope`.
reuse: Boolean, whether to reuse the weights of a previous layer
by the same name.
Returns:
A `Tensor` with one more rank than inputs‘s. The last dimensionality
should be `num_units`.
其中有用到一个函数: 其作用相当于,中文---英文 之间的对应 一个博客里讲的很靠谱吧,就是输入一个inputs_tensor 当作字典,
然后给出要表示的ids,最后给出tensor
其链接:https://www.jianshu.com/p/677e71364c8e 其用到one-hot编码https://blog.csdn.net/pipisorry/article/details/61193868
(3)multi-head attention;
a. QKV的全连接 dense:全连接层,其最后一维变为num_units,
且 outputs = activation(inputs * kernel + bias)
b.mask 的操作,利用reduce_sum找出为0 的,进行mask,通过将attention_score设置为最小值,标记其位置
(4)dropout:
(5)label_smothing:做平滑操作
(6)位置编码: 有点问题
最近一直在看这个,但是还是有很多的问题。。
以上是关于transformer中的 train.py的理解的主要内容,如果未能解决你的问题,请参考以下文章
如何解决 Yolov5 train,py in yaml 中的错误
在 Azure ML Pipeline 的 train.py 中读取/装载 csv 文件