SciPy / Numpy的Pooling / Convolution比Tensorflow的Convolution / Pooling更快？

Question

我正在尝试使用GPU来加速我的神经网络应用程序（Spiking网络）中的卷积和池化操作。我写了一个小脚本，看看使用Tensorflow可以获得多少加速。令人惊讶的是，SciPy / Numpy做得更好。在我的应用程序中，所有输入（图像）都存储在磁盘上但是作为一个例子，我创建了一个大小为27x27的随机初始化图像和权重大小为5x5x30的内核，我确保我没有从CPU转移到GPU和我还将输入图像大小增加到270x270，将权重内核增加到7x7x30，我仍然没有看到任何改进。我确保所有TF方法实际上都是通过设置在我的GPU上执行的

sess =tf.Session(config=tf.ConfigProto(log_device_placement=True))

我可以在群集上访问2个GPU（Tesla K20m）。

这是我的代码：

import tensorflow as tf
import numpy as np
from scipy import signal
import time
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

image_size = 27
kernel_size = 5
nofMaps = 30

def convolution(Image, weights):
    in_channels = 1 # 1 because our image has 1 units in the -z direction. 
    out_channels = weights.shape[-1]
    strides_1d = [1, 1, 1, 1]

    #in_2d = tf.constant(Image, dtype=tf.float32)
    in_2d = Image
    #filter_3d = tf.constant(weights, dtype=tf.float32)
    filter_3d =weights

    in_width = int(in_2d.shape[0])
    in_height = int(in_2d.shape[1])

    filter_width = int(filter_3d.shape[0])
    filter_height = int(filter_3d.shape[1])

    input_4d   = tf.reshape(in_2d, [1, in_height, in_width, in_channels])
    kernel_4d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, out_channels])
    inter = tf.nn.conv2d(input_4d, kernel_4d, strides=strides_1d, padding='VALID')
    output_3d = tf.squeeze(inter)
    output_3d= sess.run(output_3d)
    return output_3d


def pooling(Image):
    in_channels = Image.shape[-1]
    Image_3d = tf.constant(Image, dtype = tf.float32)
    in_width = int(Image.shape[0])
    in_height = int(Image.shape[1])
    Image_4d = tf.reshape(Image_3d,[1,in_width,in_height,in_channels])
    pooled_pots4d = tf.layers.max_pooling2d(inputs=Image_4d, pool_size=[2, 2], strides=2)
    pooled_pots3d = tf.squeeze(pooled_pots4d)
    return sess.run(pooled_pots3d)


t1 = time.time()
#with tf.device('/device:GPU:1'):
Image = tf.random_uniform([image_size, image_size], name='Image')
weights = tf.random_uniform([kernel_size,kernel_size,nofMaps], name='Weights')
conv_result = convolution(Image,weights)
pool_result = pooling(conv_result)

print('Time taken:{}'.format(time.time()-t1))
#with tf.device('/device:CPU:0'):
print('Pool_result shape:{}'.format(pool_result.shape))
#print('first map of pool result:
',pool_result[:,:,0])


def scipy_convolution(Image,weights):
    instant_conv1_pots = np.zeros((image_size-kernel_size+1,image_size-kernel_size+1,nofMaps))
    for i in range(weights.shape[-1]):
        instant_conv1_pots[:,:,i]=signal.correlate(Image,weights[:,:,i],mode='valid',method='fft')
    return instant_conv1_pots

def scipy_pooling(conv1_spikes):
    '''
       Reshape splitting each of the two axes into two each such that the
       latter of the split axes is of the same length as the block size.
       This would give us a 4D array. Then, perform maximum finding along those
       latter axes, which would be the second and fourth axes in that 4D array.
       https://stackoverflow.com/questions/41813722/numpy-array-reshaped-but-how-to-change-axis-for-pooling
    '''
    if(conv1_spikes.shape[0]%2!=0): #if array is odd size then omit the last row and col
        conv1_spikes = conv1_spikes[0:-1,0:-1,:]
    else:
        conv1_spikes = conv1_spikes
    m,n = conv1_spikes[:,:,0].shape
    o   = conv1_spikes.shape[-1]
    pool1_spikes = np.zeros((m/2,n/2,o))
    for i in range(o):
        pool1_spikes[:,:,i]=conv1_spikes[:,:,i].reshape(m/2,2,n/2,2).max(axis=(1,3))
    return pool1_spikes
t1 = time.time()
Image = np.random.rand(image_size,image_size)
weights = np.random.rand(kernel_size,kernel_size,nofMaps)
conv_result = scipy_convolution(Image,weights)
pool_result = scipy_pooling(conv_result)
print('Time taken:{}'.format(time.time()-t1))
print('Pool_result shape:{}'.format(pool_result.shape))
#print('first map of pool result:
',pool_result[:,:,0])
~

结果如下：

Time taken:0.746644973755
Pool_result shape:(11, 11, 30)
Time taken:0.0127348899841
Pool_result shape:(11, 11, 30)