Wowza技术:使用NVIDIA CUDA硬件加速编解码时,如何在多个GPU之间实现负载均衡?
Posted 哲想软件
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Wowza技术:使用NVIDIA CUDA硬件加速编解码时,如何在多个GPU之间实现负载均衡?相关的知识,希望对你有一定的参考价值。
从4.6.0版本开始,在利用NVIDIA CUDA硬件进行加速编解码时,Wowza Streaming Engine已经实现了多个GPU之间的负载均衡。与此同时,Wowza还提供了相应的API接口,这样可以让你按你自己的需求和逻辑实现自己需要的负载均衡功能。
下面这篇文章就是对它们的介绍。
注意:本功能需要Wowza Streaming Engine™ 4.6.0及以上版本的支持。
Wowza Transcoder在运行时会调用下面的Interface ITranscoderVideoLoadBalancer:
public interfaceITranscoderVideoLoadBalancer
{
publicabstract void init(IServer server, TranscoderContextServertranscoderContextServer);
publicabstract void onHardwareInspection(TranscoderContextServertranscoderContextServer);
publicabstract void onTranscoderSessionCreate(LiveStreamTranscoderliveStreamTranscoder);
publicabstract void onTranscoderSessionInit(LiveStreamTranscoderliveStreamTranscoder);
publicabstract void onTranscoderSessionDestroy(LiveStreamTranscoderliveStreamTranscoder);
publicabstract void onTranscoderSessionLoadBalance(LiveStreamTranscoderliveStreamTranscoder);
}
其中:
init 是当服务器启动时被调用
onHardwareInspection是当Transcoder刚启动正在检测显卡等包含GPU的硬件加速设备资源时被调用
onTranscoderSessionCreate当一个转码任务(session)创建时被调用
onTranscoderSessionInit当一个转码任务(session)完成初始化且转码模板被读取完毕时被调用
onTranscoderSessionDestroy当一个转码任务(session)销毁时被调用
onTranscoderSessionLoadBalance当一个转码任务正处在decoder、scaler以及encoder的动作被初始化时被调用
配置 ITranscodeVideoLoadBalancer 接口的实现类
要使用ITranscoderVideoLoadBalancer接口,你需要按以下操作:
创建一个class,继承TranscoderVideoLoadBalancerBase (它实现了上面介绍的ITranscoderVideoLoadBalancer接口),然后重载接口中的方法。然后在[install-dir]/conf/Server.xml文件中的添加一个server级别的参数,指明这个实现类的完整类包名:
<Property>
<Name>transcoderVideoLoadBalancerClass</Name>
<Value>[custom-class-path]</Value>
</Property>
其中[custom-class-path]就是你的实现类的完整类包名。例如,如果你使用的就是Wowza内部自带的这个TranscoderVideoLoadBalancerCUDASimple实现类,你就该按下面配置:
<Property>
<Name>transcoderVideoLoadBalancerClass</Name>
<Value>com.wowza.wms.transcoder.model.TranscoderVideoLoadBalancerCUDASimple</Value>
</Property>
(可选)如果你的多个GPU性能各不一样,你还可以用transcoderVideoLoadBalancerCUDASimpleGPUWeights 参数为每一个GPU设置不同的权重。
例子 class - TranscoderVideoLoadBalancerCUDASimple
从Wowza StreamingEngine (4.5.0.01)开始,TranscoderVideoLoadBalancerCUDASimple) 就已经内置在Wowza中了。你不用做任何开发工作就可以直接使用。它的负载均衡机制是将每一个独立的转码任务(session)的所有工作都分配在一个GPU上,也就说一个转码任务内部的工作不会在多个GPU之间来回切换。
import java.util.*;
import com.wowza.util.*;
import com.wowza.wms.application.*;
import com.wowza.wms.logging.*;
import com.wowza.wms.media.model.*;
import com.wowza.wms.server.*;
public classTranscoderVideoLoadBalancerCUDASimple extends TranscoderVideoLoadBalancerBase
{
privatestatic final Class<TranscoderVideoLoadBalancerCUDASimple>.class;
privatestatic final String CLASSNAME ="TranscoderVideoLoadBalancerCUDASimple";
publicstatic final int DEFAULT_GPU_WEIGHT_SCALE = 1;
publicstatic final int DEFAULT_WEIGHT_FACTOR_ENCODE = 5;
publicstatic final int DEFAULT_WEIGHT_FACTOR_DECODE = 1;
publicstatic final int DEFAULT_WEIGHT_FACTOR_SCALE = 1;
publicstatic final int LOAD_MAG = 1000;
publicstatic final String PROPNAME_TRANSCODER_SESSION ="TranscoderVideoLoadBalancerCUDASimpleSessionInfo";
classSessionInfo
{
privateint gpuid = 0;
privatelong load = 0;
publicSessionInfo(int gpuid, long load)
{
this.gpuid= gpuid;
this.load= load;
}
}
classGPUInfo
{
privateint gpuid = 0;
privatelong currentLoad = 0;
privateint weight = 0;
privateint getWeight()
{
returnthis.weight;
}
privatelong getUnWeightedLoad()
{
returncurrentLoad;
}
privatelong getWeightedLoad()
{
longload = 0;
if(weight > 0)
load= (currentLoad*gpuWeightScale)/weight;
returnload;
}
}
privateObject lock = new Object();
privateTranscoderContextServer transcoderContextServer = null;
privateboolean available = false;
privateint countGPU = 0;
privateint gpuWeightScale = DEFAULT_GPU_WEIGHT_SCALE;
privateint[] gpuWeights = null;
privateint weightFactorEncode = DEFAULT_WEIGHT_FACTOR_ENCODE;
privateint weightFactorDecode = DEFAULT_WEIGHT_FACTOR_DECODE;
privateint weightFactorScale = DEFAULT_WEIGHT_FACTOR_SCALE;
privateGPUInfo[] gpuInfos = null;
@Override
publicvoid init(IServer server, TranscoderContextServer transcoderContextServer)
{
this.transcoderContextServer= transcoderContextServer;
WMSPropertiesprops = server.getProperties();
this.weightFactorEncode=props.getPropertyInt("transcoderVideoLoadBalancerCUDASimpleWeightFactorEncode",this.weightFactorEncode);
this.weightFactorDecode=props.getPropertyInt("transcoderVideoLoadBalancerCUDASimpleWeightFactorDecode",this.weightFactorDecode);
this.weightFactorScale=props.getPropertyInt("transcoderVideoLoadBalancerCUDASimpleWeightFactorScale",this.weightFactorScale);
StringweightsStr =props.getPropertyStr("transcoderVideoLoadBalancerCUDASimpleGPUWeights",null);
if(weightsStr != null)
{
String[]values = weightsStr.split(",");
intmaxWeight = 0;
this.gpuWeights= new int[values.length];
for(inti=0;i<values.length;i++)
{
Stringvalue = values[i].trim();
if(value.length() <= 0)
{
this.gpuWeights[i]= -1;
continue;
}
intweight = -1;
try
{
weight= Integer.parseInt(value);
if(weight < 0)
weight= 0;
}
catch(Exceptione)
{
}
this.gpuWeights[i]= weight;
if(weight > maxWeight)
maxWeight= weight;
}
this.gpuWeightScale= maxWeight;
for(inti=0;i<this.gpuWeights.length;i++)
{
if(this.gpuWeights[i] < 0)
this.gpuWeights[i]= this.gpuWeightScale;
}
}
WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".init:weightFactorEncode:"+weightFactorEncode+"weightFactorDecode:"+weightFactorDecode+"weightFactorScale:"+weightFactorScale);
}
@Override
publicvoid onTranscoderSessionCreate(LiveStreamTranscoder liveStreamTranscoder)
{
}
@Override
publicvoid onTranscoderSessionInit(LiveStreamTranscoder liveStreamTranscoder)
{
}
@Override
publicvoid onTranscoderSessionDestroy(LiveStreamTranscoder liveStreamTranscoder)
{
if(this.countGPU > 1)
{
WMSPropertiesprops = liveStreamTranscoder.getProperties();
ObjectsessionInfoObj = props.get(PROPNAME_TRANSCODER_SESSION);
if(sessionInfoObj != null && sessionInfoObj instanceof SessionInfo)
{
SessionInfosessionInfo = (SessionInfo)sessionInfoObj;
if(sessionInfo.gpuid < gpuInfos.length)
{
synchronized(this.lock)
{
gpuInfos[sessionInfo.gpuid].currentLoad-= sessionInfo.load;
sessionInfo.load= 0;
}
WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onTranscoderSessionDestroy["+liveStreamTranscoder.getContextStr()+"]:Removing GPU session: gpuid:"+sessionInfo.gpuid+"load:"+sessionInfo.load);
}
}
}
}
@Override
publicvoid onHardwareInspection(TranscoderContextServer transcoderContextServer)
{
//{"infoCUDA":{"availabe":true,"availableFlags":65651,"countGPU":1,"driverVersion":368.81,"cudaVersion":8000,"isCUDAOldH264WindowsAvailable":false,"gpuInfo":[{"name":"GeForceGTX960M","versionMajor":5,"versionMinor":0,"clockRate":1097500,"multiprocessorCount":5,"totalMemory":2147483648,"coreCount":640,"isCUDANVCUVIDAvailable":true,"isCUDAH264EncodeAvailable":true,"isCUDAH265EncodeAvailable":false,"getCUDANVENCVersion":5}]},"infoQuickSync":{"availabe":true,"availableFlags":537,"versionMajor":1,"versionMinor":19,"isQuickSyncH264EncodeAvailable":true,"isQuickSyncH265EncodeAvailable":true,"isQuickSyncVP8EncodeAvailable":false,"isQuickSyncVP9EncodeAvailable":false,"isQuickSyncH264DecodeAvailable":true,"isQuickSyncH265DecodeAvailable":false,"isQuickSyncMP2DecodeAvailable":true,"isQuickSyncVP8DecodeAvailable":false,"isQuickSyncVP9DecodeAvailable":false},"infoVAAPI":{"available":false},"infoX264":{"available":false},"infoX265":{"available":false}}
booleanavailable = false;
intcountGPU = 0;
StringjsonStr = transcoderContextServer.getHardwareInfoJSON();
if(jsonStr != null)
{
try
{
JSONjsonData = new JSON(jsonStr);
if(jsonData != null)
{
Map<String,Object> entries = jsonData.getEntrys();
Map<String,Object> infoCUDA = (Map<String,Object>)entries.get("infoCUDA");
if(infoCUDA != null)
{
ObjectavailableObj = infoCUDA.get("availabe");
if(availableObj != null && availableObj instanceof Boolean)
{
available= ((Boolean)availableObj).booleanValue();
}
if(available)
{
ObjectcountGPUObj = infoCUDA.get("countGPU");
if(countGPUObj != null && countGPUObj instanceof Integer)
{
countGPU= ((Integer)countGPUObj).intValue();
}
}
}
}
}
catch(Exceptione)
{
WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onHardwareInspection:Parsing JSON: ", e);
}
}
WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onHardwareInspection:CUDA available:"+available+" countGPU:"+countGPU);
synchronized(lock)
{
this.available= available;
this.countGPU= countGPU;
if(this.countGPU > 1)
{
this.gpuInfos= new GPUInfo[this.countGPU];
for(inti=0;i<this.gpuInfos.length;i++)
{
this.gpuInfos[i]= new GPUInfo();
this.gpuInfos[i].gpuid= i;
if(this.gpuWeights != null && i < this.gpuWeights.length)
this.gpuInfos[i].weight= this.gpuWeights[i];
else
this.gpuInfos[i].weight= gpuWeightScale;
}
}
}
}
@Override
publicvoid onTranscoderSessionLoadBalance(LiveStreamTranscoder liveStreamTranscoder)
{
try
{
while(true)
{
if(this.gpuInfos == null)
break;
TranscoderStreamtranscoderStream = liveStreamTranscoder.getTranscodingStream();
if(transcoderStream == null)
break;
TranscoderSessiontranscoderSession = liveStreamTranscoder.getTranscodingSession();
if(transcoderSession == null)
break;
TranscoderSessionVideotranscoderSessionVideo = transcoderSession.getSessionVideo();
if(transcoderSessionVideo == null)
break;
MediaCodecInfoVideocodecInfoVideo = null;
if(transcoderSessionVideo.getCodecInfo() != null)
codecInfoVideo= transcoderSession.getSessionVideo().getCodecInfo();
longloadDecode = 0;
longloadScale = 0;
longloadEncode = 0;
booleanisScalerCUDA = false;
TranscoderStreamSourceVideotranscoderStreamSourceVideo = null;
TranscoderStreamScalertranscoderStreamScaler = null;
TranscoderStreamSourcetranscoderStreamSource = transcoderStream.getSource();
if(transcoderStreamSource != null)
{
transcoderStreamSourceVideo= transcoderStreamSource.getVideo();
if(transcoderStreamSourceVideo != null && codecInfoVideo != null&& (transcoderStreamSourceVideo.isImplementationNVCUVID() ||transcoderStreamSourceVideo.isImplementationCUDA()))
{
loadDecode= codecInfoVideo.getFrameWidth() * codecInfoVideo.getFrameHeight();
}
else
transcoderStreamSourceVideo= null;
}
transcoderStreamScaler= transcoderStream.getScaler();
if(transcoderStreamScaler != null)
{
isScalerCUDA =transcoderStreamScaler.isImplementationCUDA();
}
List<TranscoderStreamDestination>destinations = transcoderStream.getDestinations();
if(destinations == null)
break;
for(TranscoderStreamDestinationdestination : destinations)
{
if(!destination.isEnable())
continue;
TranscoderStreamDestinationVideodestinationVideo = destination.getVideo();
if(destinationVideo == null)
continue;
if(destinationVideo.isPassThrough() || destinationVideo.isDisable())
continue;
TranscoderVideoFrameSizeHolderframeSizeHolder = destinationVideo.getFrameSizeHolder();
if(frameSizeHolder == null)
continue;
if(isScalerCUDA)
loadScale+= frameSizeHolder.getActualWidth() * frameSizeHolder.getActualHeight();
if(destinationVideo.isImplementationNVENC() ||destinationVideo.isImplementationCUDA())
loadEncode+= frameSizeHolder.getActualWidth() * frameSizeHolder.getActualHeight();
}
longtotalLoad = (loadDecode*weightFactorDecode) + (loadScale*weightFactorScale) +(loadEncode*weightFactorEncode);
if(totalLoad <= 0)
break;
totalLoad/= LOAD_MAG;
if(totalLoad <= 0)
totalLoad= 1;
intgpuid = -1;
synchronized(lock)
{
longleastLoad = Long.MAX_VALUE;
for(inti=0;i<gpuInfos.length;i++)
{
if(gpuInfos[i].getWeightedLoad() < leastLoad)
{
leastLoad= gpuInfos[i].getWeightedLoad();
gpuid= i;
}
}
if(gpuid >= 0)
gpuInfos[gpuid].currentLoad+= totalLoad;
}
WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onTranscoderSessionLoadBalance["+liveStreamTranscoder.getContextStr()+"]:gpuid:"+gpuid+" load:"+totalLoad+"[decode:"+loadDecode+" scale:"+loadScale+"encode:"+loadEncode+"]");
if(gpuid >= 0)
{
liveStreamTranscoder.getProperties().put(PROPNAME_TRANSCODER_SESSION,new SessionInfo(gpuid, totalLoad));
if(transcoderStreamSourceVideo != null)
transcoderStreamSourceVideo.setGPUID(gpuid);
if(transcoderStreamScaler != null && isScalerCUDA)
transcoderStreamScaler.setGPUID(gpuid);
for(TranscoderStreamDestinationdestination : destinations)
{
if(!destination.isEnable())
continue;
TranscoderStreamDestinationVideodestinationVideo = destination.getVideo();
if(destinationVideo == null)
continue;
if(destinationVideo.isPassThrough() || destinationVideo.isDisable())
continue;
if(destinationVideo.isImplementationNVENC() || destinationVideo.isImplementationCUDA())
destinationVideo.setGPUID(gpuid);
}
}
break;
}
}
catch(Exceptione)
{
WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onTranscoderSessionLoadBalance:Parsing JSON: ", e);
}
}
}
不同性能的多个GPU之间的负载均衡
这个内建的TranscoderVideoLoadBalancerCUDASimple class 支持在不同性能的多个GPU之间实现负载均衡。你可以为每一个GPU设置不同的性能权重(或者叫负载权重,因为性能越高的当然可以承担更多的负载任务)。 这个权重配置在transcoderVideoLoadBalancerCUDASimpleGPUWeights参数中。
这个参数在一个列表中,用逗号分隔各个GPU的不同权重。我们建议你将性能最好的GPU的权重设置为100,然后其它性能低的GPU根据具体性能设置为对应的百分比。
对于这个列表中的GPU权重的顺序,你可以在Wowza Streaming Engine的启动日志中看到,也就是说这里的顺序和日志中显示的GPU顺序是一样的。 例如,如果你的服务器上有一个M5000 卡 (顺序 0) 和一个 M2000 卡 (顺序 1),那么在transcoderVideoLoadBalancerCUDASimpleGPUWeights中的权重可以按如下来配置:
<Property>
<Name>transcoderVideoLoadBalancerCUDASimpleGPUWeights</Name>
<Value>100,66</Value>
</Property>
这表示了M2000卡的性能只有M5000卡性能的66%。
Wowza Streaming Engine 4是业界功能强大、API接口丰富的流媒体Server产品,采用它作为流媒体服务器产品的案例很多,直播、在线教育、IPTV都有它的用武之地。
公司名称:北京哲想软件有限公司
北京哲想软件官方网站:www.cogitosoft.com
北京哲想软件微信公众平台账号:cogitosoftware
北京哲想软件微博:哲想软件
北京哲想软件邮箱:sales@cogitosoft.com
销售(俞先生)联系方式:+86(010)68421378
微信:18610247936 QQ:368531638
以上是关于Wowza技术:使用NVIDIA CUDA硬件加速编解码时,如何在多个GPU之间实现负载均衡?的主要内容,如果未能解决你的问题,请参考以下文章
什么是CUDA和CUDNN?——GeForce NVIDIA显卡用于深度学习计算的GPU加速工具
ffmpeg使用硬件加速hwaccelcuvidh264_cuvidh264_nvenc
ffmpeg使用硬件加速hwaccelcuvidh264_cuvidh264_nvenc