opencv系列之基于NVIDIA显卡的opencv-python硬解方案

Posted 狂奔的CD

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了opencv系列之基于NVIDIA显卡的opencv-python硬解方案相关的知识,希望对你有一定的参考价值。

文章目录

前言

ffmpeg编译使用cuvid硬解方案试过了,不过解码出来的像素格式为YUV420, opencv中使用需要转成BGR,转色彩空间这部占用的CPU过高。

因此需要将转色彩空间这步也用GPU来处理,NVIDIA 开源了适用于 Python 的视频处理框架「VideoProcessingFramework(VPF)」。该框架为开发人员提供了一个简单但功能强大的 Python 工具,可用于硬件加速的视频编码、解码和处理类等任务。

同时,由于 Python 绑定下的 C ++ 代码,它使开发者可以在数十行代码中实现较高的 GPU 利用率。解码后的视频帧以 NumPy 数组或 CUDA 设备指针的形式公开,以简化交互过程及其扩展功能。

目前,VPF 并未对 NVIDIA Video Codec SDK 附加任何限制,开发者可充分利用 NVIDIA 专业级 GPU 的功能。

说明参考 VPF:适用于 Python 的开源视频处理框架,加速视频任务、提高 GPU 利用率

同时,VPF also supports exporting GPU memory objects such as decoded video frames to PyTorch tensors without Host to Device copies.
对于PyTorch推理及其友好。

正文

下面看看如何编译安装
参考 Ubuntu上安装NVIDIA VideoProcessingFramework (VPF)

前置安装

①安装与GPU匹配的CUDA和英伟达显卡驱动,需要注意版本对应。
下载NVIDIA Video Codec SDK并解压,官网下载需要注册
安装对应nvidia驱动版本的Nvidia Video Codec SDK
我的是linux 470.86, 因此下载VideoCodecSDK11.1
解压后拷贝头文件和so到指定位置

unzip Video_Codec_SDK.zip
cd Video_Codec_SDK
$ sudo cp Interface/* /usr/local/cuda/include
$ sudo cp Lib/linux/stubs/x86_64/* /usr/local/cuda/lib64/stubs

③编译安装ffmpeg,我编译了ffmpeg的cuvid版本, 还不清楚的可以翻看以前的文章 经测试需要ffmpeg3.x版本

安装VPF

# Clone repo and start building process
cd ~/installs
git clone https://github.com/NVIDIA/VideoProcessingFramework.git

# Export path to CUDA compiler (you may need this sometimes if you install drivers from Nvidia site):
export CUDACXX=/usr/local/cuda-11.3/bin/nvcc

# Now the build itself
cd VideoProcessingFramework
mkdir -p install
mkdir -p build
cd build

# If you want to generate Pytorch extension, set up corresponding CMake value GENERATE_PYTORCH_EXTENSION  

cmake ..   -DFFMPEG_DIR:PATH="/usr/local/ffmpeg3.4.9"  \\
-DVIDEO_CODEC_SDK_INCLUDE_DIR:PATH="/usr/local/cuda/include"   \\
-DGENERATE_PYTHON_BINDINGS:BOOL="1"   \\
-DGENERATE_PYTORCH_EXTENSION:BOOL="0"  \\
-DPYTHON_LIBRARY=/home/hw/anaconda3/envs/cd_test/lib/libpython3.8.so   \\
-DCMAKE_INSTALL_PREFIX:PATH="../install" \\
-DPYTHON_EXECUTABLE=/home/hw/anaconda3/envs/cd_test/bin/python3 \\
-DPYTHON_INCLUDE_DIR=/home/hw/anaconda3/envs/cd_test/include/python3.8


# 编译安装
make -j6  && sudo make install

# 验证是否成功
cd ../install/bin
conda activate cd_test
$ python3 SampleDecodeRTSP.py 0 rtsp://xxxx
This sample decodes multiple videos in parallel on given GPU.
It doesn't do anything beside decoding, output isn't saved.
Usage: SampleDecodeRTSP.py $gpu_id $url1 ... $urlN .
[h264 @ 0x55678af45560] co located POCs unavailable
Input #0, rtsp, from 'rtsp://192.168.3.99:8554/handwriting1':
  Metadata:
    title           : Stream
  Duration: N/A, start: -0.856438, bitrate: N/A
    Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 60 fps, 60.08 tbr, 90k tbn, 120.16 tbc
    Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp
Output #0, h264, to 'pipe:1':
  Metadata:
    title           : Stream
    encoder         : Lavf57.83.100
    Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 60 fps, 60.08 tbr, 60.08 tbn, 60.08 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
3e123055-63a0-45f4-b8ac-82cf60f321ea 508kB time=00:00:03.52 bitrate=1180.1kbits/s speed=1.11x
3e123055-63a0-45f4-b8ac-82cf60f321ea1985kB time=00:00:05.57 bitrate=2916.0kbits/s speed=1.07x
3e123055-63a0-45f4-b8ac-82cf60f321ea2749kB time=00:00:06.59 bitrate=3416.6kbits/s speed=1.06x
3e123055-63a0-45f4-b8ac-82cf60f321ea3448kB time=00:00:07.58 bitrate=3721.1kbits/s speed=1.05x

查看了下Sample源码,使用ffmpeg做了解封装,然后再用VPF的API做硬解码

如果需要在其他工程中使用VPF,则拷贝编译好的PyNvCodec.cpython-38-x86_64-linux-gnu.so文件到工程主目录下,或者在工程代码中使用sys.path.append(’/root/user/installs/VideoProcessingFramework/install/bin’)来添加,还可以将生成的.so文件拷贝到使用的Python包路径(例如cp PyNvCodec.cpython-38-x86_64-linux-gnu.so /root/conda/envs/env_name/lib/python3.8/site-packages/)。

编码使用

#
# Copyright 2019 NVIDIA Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Starting from Python 3.8 DLL search policy has changed.
# We need to add path to CUDA DLLs explicitly.
import multiprocessing
import sys
import os
import threading
from typing import Dict
import cv2

if os.name == 'nt':
    # Add CUDA_PATH env variable
    cuda_path = os.environ["CUDA_PATH"]
    if cuda_path:
        os.add_dll_directory(cuda_path)
    else:
        print("CUDA_PATH environment variable is not set.", file=sys.stderr)
        print("Can't set CUDA DLLs search path.", file=sys.stderr)
        exit(1)

    # Add PATH as well for minor CUDA releases
    sys_path = os.environ["PATH"]
    if sys_path:
        paths = sys_path.split(';')
        for path in paths:
            if os.path.isdir(path):
                os.add_dll_directory(path)
    else:
        print("PATH environment variable is not set.", file=sys.stderr)
        exit(1)

import PyNvCodec as nvc
import numpy as np

from io import BytesIO
from multiprocessing import Process
import subprocess
import uuid
import json
import pycuda.driver as cuda


def get_stream_params(url: str) -> Dict:
    cmd = [
        'ffprobe',
        '-v', 'quiet',
        '-print_format', 'json',
        '-show_format', '-show_streams', url]
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    stdout = proc.communicate()[0]

    bio = BytesIO(stdout)
    json_out = json.load(bio)

    params = 
    if not 'streams' in json_out:
        return 

    for stream in json_out['streams']:
        if stream['codec_type'] == 'video':
            params['width'] = stream['width']
            params['height'] = stream['height']
            params['framerate'] = float(eval(stream['avg_frame_rate']))

            codec_name = stream['codec_name']
            is_h264 = True if codec_name == 'h264' else False
            is_hevc = True if codec_name == 'hevc' else False
            if not is_h264 and not is_hevc:
                raise ValueError("Unsupported codec: " + codec_name +
                                 '. Only H.264 and HEVC are supported in this sample.')
            else:
                params['codec'] = nvc.CudaVideoCodec.H264 if is_h264 else nvc.CudaVideoCodec.HEVC

                pix_fmt = stream['pix_fmt']
                is_yuv420 = pix_fmt == 'yuv420p'
                is_yuv444 = pix_fmt == 'yuv444p'

                # YUVJ420P and YUVJ444P are deprecated but still wide spread, so handle
                # them as well. They also indicate JPEG color range.
                is_yuvj420 = pix_fmt == 'yuvj420p'
                is_yuvj444 = pix_fmt == 'yuvj444p'

                if is_yuvj420:
                    is_yuv420 = True
                    params['color_range'] = nvc.ColorRange.JPEG
                if is_yuvj444:
                    is_yuv444 = True
                    params['color_range'] = nvc.ColorRange.JPEG

                if not is_yuv420 and not is_yuv444:
                    raise ValueError("Unsupported pixel format: " +
                                     pix_fmt +
                                     '. Only YUV420 and YUV444 are supported in this sample.')
                else:
                    params['format'] = nvc.PixelFormat.NV12 if is_yuv420 else nvc.PixelFormat.YUV444

                # Color range default option. We may have set when parsing
                # pixel format, so check first.
                if 'color_range' not in params:
                    params['color_range'] = nvc.ColorRange.MPEG
                # Check actual value.
                if 'color_range' in stream:
                    color_range = stream['color_range']
                    if color_range == 'pc' or color_range == 'jpeg':
                        params['color_range'] = nvc.ColorRange.JPEG

                # Color space default option:
                params['color_space'] = nvc.ColorSpace.BT_601
                # Check actual value.
                if 'color_space' in stream:
                    color_space = stream['color_space']
                    if color_space == 'bt709':
                        params['color_space'] = nvc.ColorSpace.BT_709

                return params
    return 


def rtsp_client(url: str, name: str, gpu_id: int) -> None:
    # Get stream parameters
    params = get_stream_params(url)

    if not len(params):
        raise ValueError("Can not get " + url + ' streams params')

    w = params['width']
    h = params['height']
    f = params['format']
    c = params['codec']
    g = gpu_id

    # Prepare ffmpeg arguments
    if nvc.CudaVideoCodec.H264 == c:
        codec_name = 'h264'
    elif nvc.CudaVideoCodec.HEVC == c:
        codec_name = 'hevc'
    bsf_name = codec_name + '_mp4toannexb,dump_extra=all'

    cmd = [
        'ffmpeg',       '-hide_banner',  
        '-loglevel',   'quiet',
        '-i',           url,
        '-c:v',         'copy',
        '-bsf:v',       bsf_name,
        '-f',           codec_name,
        'pipe:1'
    ]
    # Run ffmpeg in subprocess and redirect it's output to pipe
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)

    cuda.init()
    cuda_ctx = cuda.Device(gpu_id).retain_primary_context()
    cuda_ctx.push()
    cuda_str = cuda.Stream()
    cuda_ctx.pop()

    # Create HW decoder class
    nvdec = nvc.PyNvDecoder(w, h, f, c, g)
    nvCvt = nvc.PySurfaceConverter(w, h, nvc.PixelFormat.NV12, nvc.PixelFormat.BGR, cuda_ctx.handle, cuda_str.handle)
    nvDwn = nvc.PySurfaceDownloader(w, h, nvCvt.Format(), cuda_ctx.handle, cuda_str.handle)
    frameSize = int(w*h*3)
    rawFrame = np.ndarray(shape=(frameSize), dtype=np.uint8)
    cc_ctx = None

    # Amount of bytes we read from pipe first time.
    read_size = 4096
    # Total bytes read and total frames decded to get average data rate
    rt = 0
    fd = 0

    # Main decoding loop, will not flush intentionally because don't know the
    # amount of frames available via RTSP.
    while True:
        # Pipe read underflow protection
        if not read_size:
            read_size = int(rt / fd)
            # Counter overflow protection
            rt = read_size
            fd = 1

        # Read data.
        # Amount doesn't really matter, will be updated later on during decode.
        bits = proc.stdout.read(read_size)
        if not len(bits):
            print("Can't read data from pipe")
            break
        else:
            rt += len(bits)

        # Decode
        enc_packet = np.frombuffer(buffer=bits, dtype=np.uint8)
        pkt_data = nvc.PacketData()
        try:
            surface_nv12 = nvdec.DecodeSurfaceFromPacket(enc_packet, pkt_data)

            if not surface_nv12.Empty():
                fd += 1
                # Shifts towards underflow to avoid increasing vRAM consumption.
                if pkt_data.bsl < read_size:
                    read_size = pkt_data.bsl
                # Print process ID every second or so.
                fps = int(params['framerate'])
                #if not fd % fps:
                #    print(name)
                #print(params)

                if cc_ctx is None:
                    cspace = params['color_space']
                    crange = nvc.ColorRange.MPEG
                    cc_ctx = nvc.ColorspaceConversionContext(cspace, crange)

                surface_bgr = nvCvt.Execute(surface_nv12, cc_ctx)
                if surface_bgr.Empty():
                    break
                if not nvDwn.DownloadSingleSurface(surface_bgr, rawFrame):
                    break

                img_bgr = rawFrame.reshape((h, w, 3))
                #cv2.imwrite("./test.jpg",img_bgr)
                #break

                

        # Handle HW exceptions in simplest possible way by decoder respawn
        except nvc.HwResetException:
            nvdec = nvc.PyNvDecoder(w, h, f, c, g)
            continue


if __name__ == "__main__":
    print("This sample decodes multiple videos in parallel on given GPU.")
    print("It doesn't do anything beside decoding, output isn't saved.")

    print("Usage: SampleDecodeRTSP.py $gpu_id $url1 ... $urlN .")

    if(len(sys.argv) < 3):
        print("Provide gpu ID and input URL(s).")
        exit(1)

    gpuID = int(sys.argv[1])
    urls = []

    for i in range(2, len(sys.argv)):
        urls.append(sys.argv[i])

    pool = []
    for url in urls:
        client = Process(target=rtsp_client, args=(
            url, str(uuid.uuid4()), gpuID))
        client.start()
        pool.append(client)

    for client in pool:
        client.join()

ps: 经测试,解码+色彩空间转换,由40%的cpu使用率降到了6%, 但是nvDwn.DownloadSingleSurface从gpu下载到cpu,使用率又升到了24%。所以尽可能的不用下载到cpu直接送入推理,全流程gpu才是王道。

以上是关于opencv系列之基于NVIDIA显卡的opencv-python硬解方案的主要内容,如果未能解决你的问题,请参考以下文章

NVIDIA 30系列等等族福音 : AMD RX 6800/6900发布

NVIDIA系列显卡驱动程序的下载、安装方法

选择Nvidia显卡还是ATI显卡

哪些NVIDIA显卡支持CUDA技术

解决win10不能安装NVIDIA的RTX 20系列的显卡驱动问题

cuda支持的显卡