向量化 Kinect 真实世界坐标处理算法以提高速度

Posted

技术标签:

【中文标题】向量化 Kinect 真实世界坐标处理算法以提高速度【英文标题】:Vectorizing the Kinect real-world coordinate processing algorithm for speed 【发布时间】:2017-05-05 14:31:51 【问题描述】:

我最近开始使用 pylibfreenect2 在 Linux 上使用 Kinect V2。

当我第一次能够在散点图中显示深度帧数据时,我很失望地发现没有一个深度像素似乎位于正确的位置。

房间的侧视图(注意天花板是弯曲的)。

我进行了一些研究,并意识到进行转换涉及一些简单的三角函数。

为了测试,我从 pylibfreenect2 中的一个预先编写的函数开始,它接受列、行和深度像素强度,然后返回该像素的实际位置:

X, Y, Z = registration.getPointXYZ(undistorted, row, col)

这在纠正位置方面做得非常好:

使用 getPointXYZ()getPointXYZRGB() 的唯一缺点是它们一次只能处理一个像素。这在 Python 中可能需要一段时间,因为它需要使用嵌套的 for 循环,如下所示:

n_rows = d.shape[0]
n_columns = d.shape[1]
out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
for row in range(n_rows):
    for col in range(n_columns):
        X, Y, Z = registration.getPointXYZ(undistorted, row, col)
        out[row * n_columns + col] = np.array([Z, X, -Y])

我试图更好地理解 getPointXYZ() 是如何计算坐标的。 据我所知,它看起来类似于这个 OpenKinect-for-Processing 函数:depthToPointCloudPos()。 虽然我怀疑 libfreenect2 的版本有更多的幕后工作。

以 gitHub 源代码为例,我尝试用 Python 重写它以进行自己的实验,结果如下:

#camera information based on the Kinect v2 hardware
CameraParams = 
  "cx":254.878,
  "cy":205.395,
  "fx":365.456,
  "fy":365.456,
  "k1":0.0905474,
  "k2":-0.26819,
  "k3":0.0950862,
  "p1":0.0,
  "p2":0.0,


def depthToPointCloudPos(x_d, y_d, z, scale = 1000):
    #calculate the xyz camera position based on the depth data    
    x = (x_d - CameraParams['cx']) * z / CameraParams['fx']
    y = (y_d - CameraParams['cy']) * z / CameraParams['fy']

    return x/scale, y/scale, z/scale

这是传统的getPointXYZ和我的自定义函数的对比:

它们看起来非常相似。但是有明显的差异。左侧比较显示了更直的边缘以及平坦天花板上的一些正弦形状。我怀疑涉及到额外的数学。

如果有人对我的函数和 libfreenect2 的 getPointXYZ 之间的差异有什么想法,我很想知道。

然而,我在这里发布的主要原因是询问是否尝试对上述函数进行矢量化以处理整个数组,而不是循环遍历每个元素。

应用我从上面学到的知识,我能够编写一个似乎是 depthToPointCloudPos 的矢量化替代函数:

[编辑]

感谢 Benjamin 帮助使这个功能更加高效!

def depthMatrixToPointCloudPos(z, scale=1000):
    #bacically this is a vectorized version of depthToPointCloudPos()
    C, R = np.indices(z.shape)

    R = np.subtract(R, CameraParams['cx'])
    R = np.multiply(R, z)
    R = np.divide(R, CameraParams['fx'] * scale)

    C = np.subtract(C, CameraParams['cy'])
    C = np.multiply(C, z)
    C = np.divide(C, CameraParams['fy'] * scale)

    return np.column_stack((z.ravel() / scale, R.ravel(), -C.ravel()))

这工作并产生与之前的函数 depthToPointCloudPos() 相同的点云结果。唯一的区别是我的处理速度从 ~1 Fps 变为 5-10 Fps (WhooHoo!)。我相信这消除了由 Python 进行所有计算引起的瓶颈。所以我的散点图现在又可以顺利运行了,计算了半真实世界的坐标。

现在我有了一个从深度帧中检索 3d 坐标的高效函数,我真的很想应用这种方法来将彩色相机数据映射到我的深度像素。但是我不确定这样做涉及到哪些数学或变量,并且没有太多提及如何在 Google 上计算它。

或者,我可以使用 libfreenect2 使用 getPointXYZRGB 将颜色映射到我的深度像素:

#Format undistorted and regisered data to real-world coordinates with mapped colors (dont forget color=out_col in setData)
n_rows = d.shape[0]
n_columns = d.shape[1]
out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
colors = np.zeros((d.shape[0] * d.shape[1], 3), dtype=np.float64)
for row in range(n_rows):
    for col in range(n_columns):
        X, Y, Z, B, G, R = registration.getPointXYZRGB(undistorted, registered, row, col)
        out[row * n_columns + col] = np.array([X, Y, Z])
        colors[row * n_columns + col] = np.divide([R, G, B], 255)
sp2.setData(pos=np.array(out, dtype=np.float64), color=colors, size=2)

生成点云和彩色顶点(非常慢

总的来说,我的两个问题基本上是:

需要哪些额外步骤才能使我的 depthToPointCloudPos() 函数(以及矢量化实现)返回的真实世界 3d 坐标数据更类似于 getPointXYZ 返回的数据() 来自 libfreenect2?

并且,在我自己的应用程序中创建(可能矢量化的)方式来生成深度到颜色配准图时会涉及什么?请查看更新,因为这个问题已经解决。


[更新]

我设法使用注册帧将颜色数据映射到每个像素。 它非常简单,只需要在调用 setData() 之前添加这些行:

colors = registered.asarray(np.uint8)
colors = np.divide(colors, 255)
colors = colors.reshape(colors.shape[0] * colors.shape[1], 4 )
colors = colors[:, :3:] #BGRA to BGR (slices out the alpha channel)  
colors = colors[...,::-1] #BGR to RGB

这允许 Python 快速处理颜色数据并提供平滑的结果。我已将它们更新/添加到下面的功能示例中。

在 Python 中实时运行颜色配准的真实坐标处理!

(GIF 图像分辨率已大大降低)


[更新]

在使用该应用程序花费了更多时间之后,我添加了一些额外的参数并调整了它们的值,希望能提高散点图的视觉质量,并可能使这个示例/问题的内容更直观。

最重要的是,我已将顶点设置为不透明:

sp2 = gl.GLScatterPlotItem(pos=pos)
sp2.setGLOptions('opaque') # Ensures not to allow vertexes located behinde other vertexes to be seen.

然后我注意到,每当缩放非常接近表面时,相邻顶点之间的距离似乎会扩大,直到所有可见的大部分都是空白空间。这部分是由于顶点的点大小没有改变。

为了帮助创建一个充满彩色顶点的“缩放友好”视口,我添加了这些行,它们根据当前缩放级别(每次更新)计算顶点大小:

# Calculate a dynamic vertex size based on window dimensions and camera's position - To become the "size" input for the scatterplot's setData() function.
v_rate = 8.0 # Rate that vertex sizes will increase as zoom level increases (adjust this to any desired value).
v_scale = np.float32(v_rate) / gl_widget.opts['distance'] # Vertex size increases as the camera is "zoomed" towards center of view.
v_offset = (gl_widget.geometry().width() / 1000)**2 # Vertex size is offset based on actual width of the viewport.
v_size = v_scale + v_offset

你瞧:

(同样,GIF 图像分辨率已大大降低)

也许不如给点云蒙皮那么好,但它似乎确实有助于在尝试了解您实际看到的内容时使事情变得更容易。

所有提到的修改都包含在功能示例中。


[更新]

从前两个动画中可以看出,现实世界坐标的点云与网格轴相比具有倾斜的方向。这是因为我没有在真实世界中补偿 Kinect 的实际方向!

因此,我实现了一个额外的矢量三角函数,它为每个顶点计算一个新的(旋转和偏移)坐标。这会相对于 Kinect 在真实空间中的实际位置正确定位它们。并且在使用可倾斜的三脚架时是必要的(也可用于连接 INU 或陀螺仪/加速度计的输出以实现实时反馈)。

def applyCameraMatrixOrientation(pt):
    # Kinect Sensor Orientation Compensation
    # bacically this is a vectorized version of applyCameraOrientation()
    # uses same trig to rotate a vertex around a gimbal.
    def rotatePoints(ax1, ax2, deg):
        # math to rotate vertexes around a center point on a plane.
        hyp = np.sqrt(pt[:, ax1] ** 2 + pt[:, ax2] ** 2) # Get the length of the hypotenuse of the real-world coordinate from center of rotation, this is the radius!
        d_tan = np.arctan2(pt[:, ax2], pt[:, ax1]) # Calculate the vertexes current angle (returns radians that go from -180 to 180)

        cur_angle = np.degrees(d_tan) % 360 # Convert radians to degrees and use modulo to adjust range from 0 to 360.
        new_angle = np.radians((cur_angle + deg) % 360) # The new angle (in radians) of the vertexes after being rotated by the value of deg.

        pt[:, ax1] = hyp * np.cos(new_angle) # Calculate the rotated coordinate for this axis.
        pt[:, ax2] = hyp * np.sin(new_angle) # Calculate the rotated coordinate for this axis.

    #rotatePoints(1, 2, CameraPosition['roll']) #rotate on the Y&Z plane # Disabled because most tripods don't roll. If an Inertial Nav Unit is available this could be used)
    rotatePoints(0, 2, CameraPosition['elevation']) #rotate on the X&Z plane
    rotatePoints(0, 1, CameraPosition['azimuth']) #rotate on the X&Y plane

    # Apply offsets for height and linear position of the sensor (from viewport's center)
    pt[:] += np.float_([CameraPosition['x'], CameraPosition['y'], CameraPosition['z']])



    return pt

请注意:rotatePoints() 仅在“高度”和“方位角”时调用。这是因为大多数三脚架不支持滚动并且默认情况下已禁用它以节省 CPU 周期。如果您打算做一些花哨的事情,那么绝对可以取消评论!

请注意,此图像中的网格地板是水平的,但左侧点云未与其对齐:

设置 Kinect 方向的参数:

CameraPosition = 
    "x": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "y": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "z": 1.7, # height in meters of actual kinect sensor from the floor.
    "roll": 0, # angle in degrees of sensor's roll (used for INU input - trig function for this is commented out by default).
    "azimuth": 0, # sensor's yaw angle in degrees.
    "elevation": -15, # sensor's pitch angle in degrees.

您应该根据传感器的实际位置和方向更新这些:

两个最重要的参数是 theta(仰角)角和离地面的高度。我只使用了一个简单的卷尺和一个校准过的眼睛,但是我打算有一天提供编码器或 INU 数据以实时更新这些参数(当传感器四处移动时)。

同样,所有更改都已反映在功能示例中。


如果有人成功地改进了这个例子,或者对如何让事情更紧凑有建议,如果你能留下评论来解释细节,我将非常感激。

这是该项目的完整功能示例:

#! /usr/bin/python

#--------------------------------#
# Kinect v2 point cloud visualization using a Numpy based 
# real-world coordinate processing algorithm and OpenGL.
#--------------------------------#

import sys
import numpy as np

from pyqtgraph.Qt import QtCore, QtGui
import pyqtgraph.opengl as gl

from pylibfreenect2 import Freenect2, SyncMultiFrameListener
from pylibfreenect2 import FrameType, Registration, Frame, libfreenect2

fn = Freenect2()
num_devices = fn.enumerateDevices()
if num_devices == 0:
    print("No device connected!")
    sys.exit(1)

serial = fn.getDeviceSerialNumber(0)
device = fn.openDevice(serial)

types = 0
types |= FrameType.Color
types |= (FrameType.Ir | FrameType.Depth)
listener = SyncMultiFrameListener(types)

# Register listeners
device.setColorFrameListener(listener)
device.setIrAndDepthFrameListener(listener)

device.start()

# NOTE: must be called after device.start()
registration = Registration(device.getIrCameraParams(),
                            device.getColorCameraParams())

undistorted = Frame(512, 424, 4)
registered = Frame(512, 424, 4)


#QT app
app = QtGui.QApplication([])
gl_widget = gl.GLViewWidget()
gl_widget.show()
gl_grid = gl.GLGridItem()
gl_widget.addItem(gl_grid)

#initialize some points data
pos = np.zeros((1,3))

sp2 = gl.GLScatterPlotItem(pos=pos)
sp2.setGLOptions('opaque') # Ensures not to allow vertexes located behinde other vertexes to be seen.

gl_widget.addItem(sp2)

# Kinects's intrinsic parameters based on v2 hardware (estimated).
CameraParams = 
  "cx":254.878,
  "cy":205.395,
  "fx":365.456,
  "fy":365.456,
  "k1":0.0905474,
  "k2":-0.26819,
  "k3":0.0950862,
  "p1":0.0,
  "p2":0.0,


def depthToPointCloudPos(x_d, y_d, z, scale=1000):
    # This runs in Python slowly as it is required to be called from within a loop, but it is a more intuitive example than it's vertorized alternative (Purly for example)
    # calculate the real-world xyz vertex coordinate from the raw depth data (one vertex at a time).
    x = (x_d - CameraParams['cx']) * z / CameraParams['fx']
    y = (y_d - CameraParams['cy']) * z / CameraParams['fy']

    return x / scale, y / scale, z / scale

def depthMatrixToPointCloudPos(z, scale=1000):
    # bacically this is a vectorized version of depthToPointCloudPos()
    # calculate the real-world xyz vertex coordinates from the raw depth data matrix.
    C, R = np.indices(z.shape)

    R = np.subtract(R, CameraParams['cx'])
    R = np.multiply(R, z)
    R = np.divide(R, CameraParams['fx'] * scale)

    C = np.subtract(C, CameraParams['cy'])
    C = np.multiply(C, z)
    C = np.divide(C, CameraParams['fy'] * scale)

    return np.column_stack((z.ravel() / scale, R.ravel(), -C.ravel()))

# Kinect's physical orientation in the real world.
CameraPosition = 
    "x": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "y": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "z": 1.7, # height in meters of actual kinect sensor from the floor.
    "roll": 0, # angle in degrees of sensor's roll (used for INU input - trig function for this is commented out by default).
    "azimuth": 0, # sensor's yaw angle in degrees.
    "elevation": -15, # sensor's pitch angle in degrees.


def applyCameraOrientation(pt):
    # Kinect Sensor Orientation Compensation
    # This runs slowly in Python as it is required to be called within a loop, but it is a more intuitive example than it's vertorized alternative (Purly for example)
    # use trig to rotate a vertex around a gimbal.
    def rotatePoints(ax1, ax2, deg):
        # math to rotate vertexes around a center point on a plane.
        hyp = np.sqrt(pt[ax1] ** 2 + pt[ax2] ** 2) # Get the length of the hypotenuse of the real-world coordinate from center of rotation, this is the radius!
        d_tan = np.arctan2(pt[ax2], pt[ax1]) # Calculate the vertexes current angle (returns radians that go from -180 to 180)

        cur_angle = np.degrees(d_tan) % 360 # Convert radians to degrees and use modulo to adjust range from 0 to 360.
        new_angle = np.radians((cur_angle + deg) % 360) # The new angle (in radians) of the vertexes after being rotated by the value of deg.

        pt[ax1] = hyp * np.cos(new_angle) # Calculate the rotated coordinate for this axis.
        pt[ax2] = hyp * np.sin(new_angle) # Calculate the rotated coordinate for this axis.

    #rotatePoints(0, 2, CameraPosition['roll']) #rotate on the Y&Z plane # Disabled because most tripods don't roll. If an Inertial Nav Unit is available this could be used)
    rotatePoints(1, 2, CameraPosition['elevation']) #rotate on the X&Z plane
    rotatePoints(0, 1, CameraPosition['azimuth']) #rotate on the X&Y plane

    # Apply offsets for height and linear position of the sensor (from viewport's center)
    pt[:] += np.float_([CameraPosition['x'], CameraPosition['y'], CameraPosition['z']])



    return pt

def applyCameraMatrixOrientation(pt):
    # Kinect Sensor Orientation Compensation
    # bacically this is a vectorized version of applyCameraOrientation()
    # uses same trig to rotate a vertex around a gimbal.
    def rotatePoints(ax1, ax2, deg):
        # math to rotate vertexes around a center point on a plane.
        hyp = np.sqrt(pt[:, ax1] ** 2 + pt[:, ax2] ** 2) # Get the length of the hypotenuse of the real-world coordinate from center of rotation, this is the radius!
        d_tan = np.arctan2(pt[:, ax2], pt[:, ax1]) # Calculate the vertexes current angle (returns radians that go from -180 to 180)

        cur_angle = np.degrees(d_tan) % 360 # Convert radians to degrees and use modulo to adjust range from 0 to 360.
        new_angle = np.radians((cur_angle + deg) % 360) # The new angle (in radians) of the vertexes after being rotated by the value of deg.

        pt[:, ax1] = hyp * np.cos(new_angle) # Calculate the rotated coordinate for this axis.
        pt[:, ax2] = hyp * np.sin(new_angle) # Calculate the rotated coordinate for this axis.

    #rotatePoints(1, 2, CameraPosition['roll']) #rotate on the Y&Z plane # Disabled because most tripods don't roll. If an Inertial Nav Unit is available this could be used)
    rotatePoints(0, 2, CameraPosition['elevation']) #rotate on the X&Z plane
    rotatePoints(0, 1, CameraPosition['azimuth']) #rotate on the X&Y

    # Apply offsets for height and linear position of the sensor (from viewport's center)
    pt[:] += np.float_([CameraPosition['x'], CameraPosition['y'], CameraPosition['z']])



    return pt


def update():
    colors = ((1.0, 1.0, 1.0, 1.0))

    frames = listener.waitForNewFrame()

    # Get the frames from the Kinect sensor
    ir = frames["ir"]
    color = frames["color"]
    depth = frames["depth"]

    d = depth.asarray() #the depth frame as an array (Needed only with non-vectorized functions)

    registration.apply(color, depth, undistorted, registered)

    # Format the color registration map - To become the "color" input for the scatterplot's setData() function.
    colors = registered.asarray(np.uint8)
    colors = np.divide(colors, 255) # values must be between 0.0 - 1.0
    colors = colors.reshape(colors.shape[0] * colors.shape[1], 4 ) # From: Rows X Cols X RGB -to- [[r,g,b],[r,g,b]...]
    colors = colors[:, :3:]  # remove alpha (fourth index) from BGRA to BGR
    colors = colors[...,::-1] #BGR to RGB

    # Calculate a dynamic vertex size based on window dimensions and camera's position - To become the "size" input for the scatterplot's setData() function.
    v_rate = 5.0 # Rate that vertex sizes will increase as zoom level increases (adjust this to any desired value).
    v_scale = np.float32(v_rate) / gl_widget.opts['distance'] # Vertex size increases as the camera is "zoomed" towards center of view.
    v_offset = (gl_widget.geometry().width() / 1000)**2 # Vertex size is offset based on actual width of the viewport.
    v_size = v_scale + v_offset

    # Calculate 3d coordinates (Note: five optional methods are shown - only one should be un-commented at any given time)

    """
    # Method 1 (No Processing) - Format raw depth data to be displayed
    m, n = d.shape
    R, C = np.mgrid[:m, :n]
    out = np.column_stack((d.ravel() / 4500, C.ravel()/m, (-R.ravel()/n)+1))
    """

    # Method 2 (Fastest) - Format and compute the real-world 3d coordinates using a fast vectorized algorithm - To become the "pos" input for the scatterplot's setData() function.
    out = depthMatrixToPointCloudPos(undistorted.asarray(np.float32))

    """
    # Method 3 - Format undistorted depth data to real-world coordinates
    n_rows, n_columns = d.shape
    out = np.zeros((n_rows * n_columns, 3), dtype=np.float32)
    for row in range(n_rows):
        for col in range(n_columns):
            z = undistorted.asarray(np.float32)[row][col]
            X, Y, Z = depthToPointCloudPos(row, col, z)
            out[row * n_columns + col] = np.array([Z, Y, -X])
    """

    """
    # Method 4 - Format undistorted depth data to real-world coordinates
    n_rows, n_columns = d.shape
    out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
    for row in range(n_rows):
        for col in range(n_columns):
            X, Y, Z = registration.getPointXYZ(undistorted, row, col)
            out[row * n_columns + col] = np.array([Z, X, -Y])
    """

    """
    # Method 5 - Format undistorted and regisered data to real-world coordinates with mapped colors (dont forget color=colors in setData)
    n_rows, n_columns = d.shape
    out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
    colors = np.zeros((d.shape[0] * d.shape[1], 3), dtype=np.float64)
    for row in range(n_rows):
        for col in range(n_columns):
            X, Y, Z, B, G, R = registration.getPointXYZRGB(undistorted, registered, row, col)
            out[row * n_columns + col] = np.array([Z, X, -Y])
            colors[row * n_columns + col] = np.divide([R, G, B], 255)
    """


    # Kinect sensor real-world orientation compensation.
    out = applyCameraMatrixOrientation(out)

    """
    # For demonstrating the non-vectorized orientation compensation function (slow)
    for i, pt in enumerate(out):
        out[i] = applyCameraOrientation(pt)
    """


    # Show the data in a scatter plot
    sp2.setData(pos=out, color=colors, size=v_size)

    # Lastly, release frames from memory.
    listener.release(frames)

t = QtCore.QTimer()
t.timeout.connect(update)
t.start(50)


## Start Qt event loop unless running in interactive mode.
if __name__ == '__main__':
    import sys
    if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
        QtGui.QApplication.instance().exec_()

device.stop()
device.close()

sys.exit(0)

【问题讨论】:

谈论性能 - 我会分配:out[row * n_columns + col] = X, Y, Z 并在嵌套循环之外通过255 进行除法:np.divide([R, G, B], 255),这个想法是在这些循环内做最少的工作。 【参考方案1】:

这并不是一个完整的答案...我只是想指出您正在创建许多临时数组,您可以在其中执行更多操作:

def depthMatrixToPointCloudPos2(z, scale=1000):

    R, C = numpy.indices(z.shape)

    R -= CameraParams['cx'])
    R *= z
    R /= CameraParams['fx'] * scale

    C -= CameraParams['cy']
    C *= z
    C /= CameraParams['fy'] * scale

    return np.column_stack((z.ravel() / scale, R.ravel(), -C.ravel()))

(如果我正确阅读了您的代码。)

另外,请注意数据类型,如果您在 64 位机器上,默认情况下将是 64 位。您可以使用较小的类型来减少要处理的数据量吗?

【讨论】:

非常干净!我已经更新了原始函数以反映这些变化。谢谢!!!

以上是关于向量化 Kinect 真实世界坐标处理算法以提高速度的主要内容,如果未能解决你的问题,请参考以下文章

向量化与HashTrick在文本挖掘中预处理中的体现

匿名函数,向量化和预分配,函数的函数,P码文件

在Opencv中校准后查找点世界坐标

如何绘制围绕中心坐标相对于设备旋转的真实世界坐标?

如何设置以米为单位的世界坐标的转换矩阵(用于增强现实)

SIGKDD2021 | 淘宝搜索向量化召回实践