Apple Metal元素矩阵乘法（Hadamard产品）

Posted 2021-03-27

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Apple Metal元素矩阵乘法（Hadamard产品）相关的知识，希望对你有一定的参考价值。

是否可以使用Apple的金属性能着色器执行Hadamard产品？我看到可以使用this执行正常的矩阵乘法，但我特意寻找一个逐元素乘法，或者一个巧妙的方法来构造一个。（例如，是否可以将MPSMatrix转换为MPSVector，然后使用向量执行产品？）

更新：我感谢使用着色器的建议！我正在开发一个实现，看起来很有前途！一旦我有工作，我会发布解决方案。

答案

好的，根据评论者的建议在这里回答我自己的问题 - 试着写我自己的着色器！

这是着色器代码：

#include <metal_stdlib>
using namespace metal;

/*
 hadamardProduct:
 Perform an element-wise multiplication (hadamard product) of the two input matrices A and B, store the result in C
 */
kernel void hadamardProductKernel(
texture_buffer<float, access::read> A [[texture(0)]],
texture_buffer<float, access::read> B [[texture(1)]],
texture_buffer<float, access::write> C [[texture(2)]],
uint gid [[thread_position_in_grid]]) {
    // C[i,j] = A[i,j] * B[i,j]
    C.write(A.read(gid) * B.read(gid), gid);
}

在两个4x4矩阵上执行着色器的swift：

import Foundation
import Metal
import MetalKit

guard
    let gpu = MTLCreateSystemDefaultDevice(),
    let commandQueue = gpu.makeCommandQueue(),
    let commandBuffer = commandQueue.makeCommandBuffer(),
    let defaultLibrary = gpu.makeDefaultLibrary(),
    let kernelFunction = defaultLibrary.makeFunction(name: "hadamardProductKernel")
else {exit(1)}

// Create the matrices to multiply (as row-major matrices)
var A:[Float] = [2,0,0,0,
                 0,2,0,0,
                 0,0,2,0,
                 0,0,0,2]

var B:[Float] = [1,0,0,0,
                 0,2,0,0,
                 0,0,3,0,
                 0,0,0,4]

let A_buffer = gpu.makeTexture(descriptor: MTLTextureDescriptor.textureBufferDescriptor(with: .r32Float,
                                                                                                                                                                                width: 16,
                                                                                                                                                                                resourceOptions: .storageModeManaged,
                                                                                                                                                                                usage: .shaderRead))
let B_buffer = gpu.makeTexture(descriptor: MTLTextureDescriptor.textureBufferDescriptor(with: .r32Float,
                                                                                                                                                                                width: 16,
                                                                                                                                                                                resourceOptions: .storageModeManaged,
                                                                                                                                                                                usage: .shaderRead))
let C_buffer = gpu.makeTexture(descriptor: MTLTextureDescriptor.textureBufferDescriptor(with: .r32Float,
                                                                                                                                                                                width: 16,
                                                                                                                                                                                resourceOptions: .storageModeManaged,
                                                                                                                                                                                usage: .shaderWrite))
A_buffer?.replace(region: MTLRegionMake1D(0, 16),
                  mipmapLevel: 0,
                  withBytes: UnsafeRawPointer(A),
                  bytesPerRow: 64)
B_buffer?.replace(region: MTLRegionMake1D(0, 16),
                  mipmapLevel: 0,
                  withBytes: UnsafeRawPointer(B),
                  bytesPerRow: 64)

let computePipelineState = try gpu.makeComputePipelineState(function: kernelFunction)
let computeEncoder = commandBuffer.makeComputeCommandEncoder()
computeEncoder?.setComputePipelineState(computePipelineState)
computeEncoder?.setTexture(A_buffer, index: 0)
computeEncoder?.setTexture(B_buffer, index: 1)
computeEncoder?.setTexture(C_buffer, index: 2)
let threadGroupSize = MTLSize(width: 16, height: 1, depth: 1)
let threadGroupCount = MTLSize(width: 1, height: 1, depth: 1)
computeEncoder?.dispatchThreadgroups(threadGroupCount, threadsPerThreadgroup: threadGroupSize)
computeEncoder?.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()

print("done")

感谢任何链接到资源的评论，以进一步了解这类事情。

以上是关于Apple Metal元素矩阵乘法（Hadamard产品）的主要内容，如果未能解决你的问题，请参考以下文章

矩阵的乘法运算怎么算？

OpenCV逐元素矩阵乘法

矩阵乘法2 (Codevs No.3147)

使用 numba 无法获得与 numpy 元素矩阵乘法相同的值

TensorFlow 中矩阵和向量的高效逐元素乘法

DolphinDB 中矩阵和向量的元素乘法