根据张量流中给定的序列长度数组对 3D 张量进行切片

Posted

技术标签:

【中文标题】根据张量流中给定的序列长度数组对 3D 张量进行切片【英文标题】:Slice a 3D tensor, based on the given sequence length array in tensorflow 【发布时间】:2019-03-24 05:30:17 【问题描述】:

我想要一个 tensorflow 函数,它接受一个 3D 矩阵和一个数组(数组的形状类似于 3D 矩阵的第一维),我想根据 3D 矩阵中的每个 2D 矩阵对元素进行切片给定的数组。等效的 numpy 如下所示。基本思想是在一个动态 rnn 中选择一个批次中每个输入的所有隐藏状态(避免填充)

import numpy as np
a = np.random.uniform(-1,1,(3,5,7))
a_length = np.random.randint(5,size=(3))

a_tf = tf.convert_to_tensor(a)
a_length_tf = tf.convert_to_tensor(a_length)

res = []
for index, length_ in enumerate(a_length):
    res.extend(a[index,:length_,:])
res = np.array(res)

输出

print(a_length)
array([1, 4, 4])


print(res)

array([[-0.060161  ,  0.36000953,  0.46160677, -0.66576281,  0.28562044,
    -0.60026872,  0.08034777],
   [ 0.04776443,  0.38018207, -0.73352382,  0.61847258, -0.89731857,
     0.57264147, -0.88192537],
   [ 0.92657628,  0.6236141 ,  0.41977008,  0.88720247,  0.44639323,
     0.26165976,  0.2678753 ],
   [-0.78125831,  0.76756136, -0.05716537, -0.64696257,  0.48918477,
     0.15376225, -0.41974593],
   [-0.625326  ,  0.3509537 , -0.7884495 ,  0.11773297,  0.23713942,
     0.30296786,  0.12932378],
   [ 0.88413986, -0.10958306,  0.9745586 ,  0.8975006 ,  0.23023047,
    -0.89991669, -0.60032688],
   [ 0.33462775,  0.62883724, -0.81839566, -0.70312966, -0.00246936,
    -0.95542994, -0.33035891],
   [-0.26355579, -0.58104982, -0.54748412, -0.30236209, -0.74270132,
     0.46329941,  0.34277915],
   [ 0.92837516, -0.06748299,  0.32837354, -0.62863672,  0.86226447,
     0.63604586,  0.0905248 ]])

print(a)
array([[[-0.060161  ,  0.36000953,  0.46160677, -0.66576281,
      0.28562044, -0.60026872,  0.08034777],
    [ 0.26379226,  0.67066755, -0.90139221, -0.86862163,
      0.36405595,  0.71342926, -0.1265208 ],
    [ 0.15007877,  0.82065234,  0.03984378, -0.20038364,
     -0.09945102,  0.71605241, -0.55865999],
    [ 0.27132257, -0.84289149, -0.15493576,  0.74683429,
     -0.71159896,  0.50397217, -0.99025404],
    [ 0.51546368,  0.45460343,  0.87519031,  0.0332339 ,
     -0.53474897, -0.01733648, -0.02886814]],

   [[ 0.04776443,  0.38018207, -0.73352382,  0.61847258,
     -0.89731857,  0.57264147, -0.88192537],
    [ 0.92657628,  0.6236141 ,  0.41977008,  0.88720247,
      0.44639323,  0.26165976,  0.2678753 ],
    [-0.78125831,  0.76756136, -0.05716537, -0.64696257,
      0.48918477,  0.15376225, -0.41974593],
    [-0.625326  ,  0.3509537 , -0.7884495 ,  0.11773297,
      0.23713942,  0.30296786,  0.12932378],
    [ 0.44550219, -0.38828221,  0.35684203,  0.789946  ,
     -0.8763921 ,  0.90155917, -0.75549455]],

   [[ 0.88413986, -0.10958306,  0.9745586 ,  0.8975006 ,
      0.23023047, -0.89991669, -0.60032688],
    [ 0.33462775,  0.62883724, -0.81839566, -0.70312966,
     -0.00246936, -0.95542994, -0.33035891],
    [-0.26355579, -0.58104982, -0.54748412, -0.30236209,
     -0.74270132,  0.46329941,  0.34277915],
    [ 0.92837516, -0.06748299,  0.32837354, -0.62863672,
      0.86226447,  0.63604586,  0.0905248 ],
    [ 0.70272633,  0.17122912, -0.58209965,  0.55557024,
     -0.46295566, -0.33845157, -0.62254313]]])

【问题讨论】:

【参考方案1】:

这是一种使用tf.boolean_mask的方法:

import tensorflow as tf
import numpy as np

# NumPy/Python implementation
a = np.random.uniform(-1,1,(3,5,7)).astype(np.float32)
a_length = np.random.randint(5,size=(3)).astype(np.int32)
res = []
for index, length_ in enumerate(a_length):
    res.extend(a[index,:length_,:])
res = np.array(res)

# TensorFlow implementation
a_tf = tf.convert_to_tensor(a)
a_length_tf = tf.convert_to_tensor(a_length)
# Make a mask for all wanted elements
mask = tf.range(tf.shape(a)[1]) < a_length_tf[:, tf.newaxis]
# Apply mask
res_tf = tf.boolean_mask(a_tf, mask)
# Test
with tf.Session() as sess:
    print(np.allclose(sess.run(res_tf), res))

输出:

True

【讨论】:

以上是关于根据张量流中给定的序列长度数组对 3D 张量进行切片的主要内容,如果未能解决你的问题,请参考以下文章

如何在张量流中使用索引数组?

如何在张量流中对张量进行子集化?

在某个索引后用零填充火炬张量

在 Tensorflow 中使用索引对张量进行切片

如何根据张量流中的列条件获取张量值的索引

张量流中张量对象的非连续索引切片(高级索引,如numpy)