鸿蒙AI能力之语音识别

Posted HarmonyOS技术社区

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了鸿蒙AI能力之语音识别相关的知识,希望对你有一定的参考价值。

【本文正在参与优质创作者激励】
文章旨在帮助大家开发录音及语音识别时少踩一点坑。

效果


左侧为简易UI布局及识别成果,右侧为网易云播放的测试音频

开发步骤

IDE安装、项目创建等在此略过。App采用SDK版本为API 6,使用JS UI

1.权限申请

AI语音识别不需要任何权限,但此处使用到麦克风录制音频,就需要申请麦克风权限。
在config.json配置文件中添加权限:

"reqPermissions": [
      
        "name": "ohos.permission.MICROPHONE"
      
    ]

在MainAbility中显示申明麦克风权限

@Override
    public void onStart(Intent intent) 
        super.onStart(intent);
        requestPermission();
    

    //获取权限
    private void requestPermission() 
        String[] permission = 
                "ohos.permission.MICROPHONE",
        ;
        List<String> applyPermissions = new ArrayList<>();
        for (String element : permission) 
            if (verifySelfPermission(element) != 0) 
                if (canRequestPermission(element)) 
                    applyPermissions.add(element);
                
            
        
        requestPermissionsFromUser(applyPermissions.toArray(new String[0]), 0);
    

2.创建音频录制的工具类

首先创建音频录制的工具类AudioCaptureUtils
而音频录制需要用到AudioCapturer类,而在创建AudioCapture类时又会用到
AudiostreamInfo类及AudioCapturerInfo类,所以我们分别申明以上3个类的变量

    private AudioStreamInfo audioStreamInfo;
    private AudioCapturer audioCapturer;
    private AudioCapturerInfo audioCapturerInfo;

在语音识别时对音频的录制是由限制的,限制如下:

所以我们在录制音频时需要注意
1.采样率16000HZ
2.声道为单声道
3.仅支持普通话
作为工具类,为了使AudioCaptureUtils能多处使用,我们在创建构造函数时,提供声道与频率的参数重载,并在构造函数中初始化AudioStreamInfo类及AudioCapturerInfo

//channelMask 声道
//SampleRate 频率
  public AudioCaptureUtils(AudioStreamInfo.ChannelMask channelMask, int SampleRate) 
        this.audioStreamInfo = new AudioStreamInfo.Builder()
                .encodingFormat(AudioStreamInfo.EncodingFormat.ENCODING_PCM_16BIT)
                .channelMask(channelMask)
                .sampleRate(SampleRate)
                .build();
        this.audioCapturerInfo = new AudioCapturerInfo.Builder().audioStreamInfo(audioStreamInfo).build();
    

在init函数中进行audioCapturer的初始化,在初始化时对音效进行设置,默认为降噪模式

//packageName 包名
 public void init(String packageName) 
        this.init(SoundEffect.SOUND_EFFECT_TYPE_NS,packageName );
    
//soundEffect 音效uuid
//packageName 包名
   public void init(UUID soundEffect, String packageName) 
        if (audioCapturer == null || audioCapturer.getState() == AudioCapturer.State.STATE_UNINITIALIZED)
            audioCapturer = new AudioCapturer(this.audioCapturerInfo);
        audioCapturer.addSoundEffect(soundEffect, packageName);
    

初始化后提供startstopdestory方法,分别开启音频录制、停止音频录制和销毁,此处都是调用AudioCapturer类中对应函数。

  public void stop()
        this.audioCapturer.stop();
    

    public void destory()
        this.audioCapturer.stop();
        this.audioCapturer.release();
    

    public Boolean start() 
        if (audioCapturer == null)
            return false;
        return audioCapturer.start();
    

提供一个读取音频流的方法及获取AudioCapturer实例的方法

//buffers 需要写入的数据流
//offset 数据流的偏移量
//byteslength 数据流的长度
 public int read(byte[] buffers, int offset, int bytesLength)
        return audioCapturer.read(buffers,offset,bytesLength);
    

//获取AudioCapturer的实例audioCapturer
  public AudioCapturer get()
        return this.audioCapturer;
    

3.创建语音识别的工具类

在上面我们已经创建好一个音频录制的工具类,接下来在创建一个语音识别的工具类 AsrUtils
我们再回顾一下语音识别的约束与限制:

在此补充一个隐藏限制,PCM流的长度只允许640与1280两种长度,也就是我们音频读取流时只能使用6401280两种长度。

接下来我们定义一些基本常量:

    //采样率限定16000HZ
    private static final int VIDEO_SAMPLE_RATE = 16000;
    //VAD结束时间 默认2000ms
    private static final int VAD_END_WAIT_MS = 2000;
    //VAD起始时间 默认4800ms 
    //这两参数与识别准确率有关,相关信息可百度查看,在此使用系统默认
    private static final int VAD_FRONT_WAIT_MS = 4800;
    //输入时常 20000ms
    private static final int TIMEOUT_DURATION = 20000;

    //PCM流长度仅限640或1280
    private static final int BYTES_LENGTH = 1280;
    //线程池相关参数
    private static final int CAPACITY = 6;
    private static final int ALIVE_TIME = 3;
    private static final int POOL_SIZE = 3;

因为要在后台持续录制音频,所以需要开辟一个新的线程。此处用到java的ThreadPoolExecutor类进行线程操作。
定义一个线程池实例以及其它相关属性如下:


    //录音线程
    private ThreadPoolExecutor poolExecutor;    
  /* 自定义状态信息
     **  错误:-1
     **  初始:0
     **  init:1
     **  开始输入:2
     **  结束输入:3
     **  识别结束:5
     **  中途出识别结果:9
     **  最终识别结果:10
     */
    public int state = 0;
    //识别结果
    public String result;
    //是否开启语音识别
    //当开启时才写入PCM流
    boolean isStarted = false;

    //ASR客户端
    private AsrClient asrClient;
    //ASR监听对象
    private AsrListener listener;
    AsrIntent asrIntent;
    //音频录制工具类
    private AudioCaptureUtils audioCaptureUtils;

在构造函数中初始化相关属性

public AsrUtils(Context context) 
        //实例化一个单声道,采集频率16000HZ的音频录制工具类实例
        this.audioCaptureUtils = new AudioCaptureUtils(AudioStreamInfo.ChannelMask.CHANNEL_IN_MONO, VIDEO_SAMPLE_RATE);
        //初始化降噪音效
        this.audioCaptureUtils.init("com.panda_coder.liedetector");
        //结果值设为空
        this.result = "";
        //给录音控件初始化一个新的线程池
        poolExecutor = new ThreadPoolExecutor(
                POOL_SIZE,
                POOL_SIZE,
                ALIVE_TIME,
                TimeUnit.SECONDS,
                new LinkedBlockingQueue<>(CAPACITY),
                new ThreadPoolExecutor.DiscardOldestPolicy());

        if (asrIntent == null) 
            asrIntent = new AsrIntent();
            //设置音频来源为PCM流
            //此处也可设置为文件            
            asrIntent.setAudioSourceType(AsrIntent.AsrAudioSrcType.ASR_SRC_TYPE_PCM);
            asrIntent.setVadEndWaitMs(VAD_END_WAIT_MS);
            asrIntent.setVadFrontWaitMs(VAD_FRONT_WAIT_MS);
            asrIntent.setTimeoutThresholdMs(TIMEOUT_DURATION);
        

        if (asrClient == null) 
            //实例化AsrClient
            asrClient = AsrClient.createAsrClient(context).orElse(null);
        
        if (listener == null) 
            //实例化MyAsrListener
            listener = new MyAsrListener();
            //初始化AsrClient
            this.asrClient.init(asrIntent, listener);
        

    

//够建一个实现AsrListener接口的类MyAsrListener 
 class MyAsrListener implements AsrListener 

        @Override
        public void onInit(PacMap pacMap) 
            HiLog.info(TAG, "====== init");
            state = 1;
        

        @Override
        public void onBeginningOfSpeech() 
            state = 2;
        

        @Override
        public void onRmsChanged(float v) 

        

        @Override
        public void onBufferReceived(byte[] bytes) 

        

        @Override
        public void onEndOfSpeech() 
            state = 3;
        

        @Override
        public void onError(int i) 
            state = -1;
            if (i == AsrError.ERROR_SPEECH_TIMEOUT) 
                //当超时时重新监听
                asrClient.startListening(asrIntent);
             else 
                HiLog.info(TAG, "======error code:" + i);
                asrClient.stopListening();
            
        

        //注意与onIntermediateResults获取结果值的区别
        //pacMap.getString(AsrResultKey.RESULTS_RECOGNITION);
        @Override
        public void onResults(PacMap pacMap) 
            state = 10;
            //获取最终结果 
            //"result":["confidence":0,"ori_word":"你 好 ","pinyin":"NI3 HAO3 ","word":"你好。"]
            String results = pacMap.getString(AsrResultKey.RESULTS_RECOGNITION);
            ZSONObject zsonObject = ZSONObject.stringToZSON(results);
            ZSONObject infoObject;
            if (zsonObject.getZSONArray("result").getZSONObject(0) instanceof ZSONObject) 
                infoObject = zsonObject.getZSONArray("result").getZSONObject(0);
                String resultWord = infoObject.getString("ori_word").replace(" ", "");
                result += resultWord;
            
        

        //中途识别结果
        //pacMap.getString(AsrResultKey.RESULTS_INTERMEDIATE)
        @Override
        public void onIntermediateResults(PacMap pacMap) 
            state = 9;
//            String result = pacMap.getString(AsrResultKey.RESULTS_INTERMEDIATE);
//            if (result == null)
//                return;
//            ZSONObject zsonObject = ZSONObject.stringToZSON(result);
//            ZSONObject infoObject;
//            if (zsonObject.getZSONArray("result").getZSONObject(0) instanceof ZSONObject) 
//                infoObject = zsonObject.getZSONArray("result").getZSONObject(0);
//                String resultWord = infoObject.getString("ori_word").replace(" ", "");
//                HiLog.info(TAG, "=========== 9 " + resultWord);
//            
        

        @Override
        public void onEnd() 
            state = 5;
            //当还在录音时,重新监听
            if (isStarted)
                asrClient.startListening(asrIntent);
        

        @Override
        public void onEvent(int i, PacMap pacMap) 

        

        @Override
        public void onAudioStart() 
            state = 2;

        

        @Override
        public void onAudioEnd() 
            state = 3;
        
    

开启识别与停止识别的函数


    public void start() 
        if (!this.isStarted) 
            this.isStarted = true;
            asrClient.startListening(asrIntent);
            poolExecutor.submit(new AudioCaptureRunnable());
        
    

    public void stop() 
        this.isStarted = false;
        asrClient.stopListening();
        audioCaptureUtils.stop();
    

    //音频录制的线程
    private class AudioCaptureRunnable implements Runnable 
        @Override
        public void run() 
            byte[] buffers = new byte[BYTES_LENGTH];
            //开启录音
            audioCaptureUtils.start();
            while (isStarted) 
                //读取录音的PCM流
                int ret = audioCaptureUtils.read(buffers, 0, BYTES_LENGTH);
                if (ret <= 0) 
                    HiLog.error(TAG, "======Error read data");
                 else 
                    //将录音的PCM流写入到语音识别服务中
                    //若buffer的长度不为1280或640时,则需要手动处理成1280或640
                    asrClient.writePcm(buffers, BYTES_LENGTH);
                
            
        
    

识别结果是通过listener的回调获取的结果,所以我们在处理时是将结果赋值给result,通过getresult或getResultAndClear函数获取结果。

 public String getResult() 
        return result;
    

    public String getResultAndClear() 
        if (this.result == "")
            return "";
        String results = getResult();
        this.result = "";
        return results;
    

4.创建一个简易的JS UI,并通过JS调ServerAbility的能力调用Java

hml代码

<div class="container">
    <div>
        <button class="btn" @touchend="start">开启</button>
        <button class="btn" @touchend="sub">订阅结果</button>
        <button class="btn" @touchend="stop">关闭</button>
    </div>
    <text class="title">
        语音识别内容:  text 
    </text>
</div>

样式代码

.container 
    flex-direction: column;
    justify-content: flex-start;
    align-items: center;
    width: 100%;
    height: 100%;
    padding: 10%;


.title 
    font-size: 20px;
    color: #000000;
    opacity: 0.9;
    text-align: left;
    width: 100%;
    margin: 3% 0;


.btn
    padding: 10px 20px;
    margin:3px;
    border-radius: 6px;

js逻辑控制代码

//js调Java ServiceAbility的工具类
import  jsCallJavaAbility  from ../../common/JsCallJavaAbilityUtils.js;

export default 
    data: 
        text: ""
    ,
    //开启事件
    start() 
        jsCallJavaAbility.callAbility("ControllerAbility",100,).then(result=>
            console.log(result)
        )
    ,
//关闭事件
    stop() 
        jsCallJavaAbility.callAbility("ControllerAbility",101,).then(result=>
            console.log(result)
        )
        jsCallJavaAbility.unSubAbility("ControllerAbility",201).then(result=>
            if (result.code == 200) 
                console.log("取消订阅成功");
            
        )
    ,
//订阅Java端结果事件
    sub() 
        jsCallJavaAbility.subAbility("ControllerAbility", 200, (data) => 
            let text = data.data.text
            text && (this.text += text)
        ).then(result => 
            if (result.code == 200) 
                console.log("订阅成功");
            
        )
    

ServerAbility

public class ControllerAbility extends Ability 
    AnswerRemote remote = new AnswerRemote();
    AsrUtils asrUtils;
    //订阅事件的委托
    private static HashMap<Integer, IRemoteObject> remoteObjectHandlers = new HashMap<Integer, IRemoteObject>();

    @Override
    public void onStart(Intent intent) 
        HiLog.error(LABEL_LOG, "ControllerAbility::onStart");
        super.onStart(intent);
    //初始化语音识别工具类
        asrUtils = new AsrUtils(this);
    

    @Override
    public void onCommand(Intent intent, boolean restart, int startId) 
    

    @Override
    public IRemoteObject onConnect(Intent intent) 
        super.onConnect(intent);
        return remote.asObject();
    

    class AnswerRemote extends RemoteObject implements IRemoteBroker 
        AnswerRemote() 
            super("");
        

        @Override
        public boolean onRemoteRequest(int code, MessageParcel data, MessageParcel reply, MessageOption option) 
            Map<String, Object> zsonResult = new HashMap<String, Object>();
            String zsonStr = data.readString();
            ZSONObject zson = ZSONObject.stringToZSON(zsonStr);
            switch (code) 
                case 100: 
            //当js发送code为100时,开启语音识别
                    asrUtils.start();
                    break;
                
                case 101: 
            //当js发送code为101时,关闭语音识别
                    asrUtils.stop();
                    break;
                
                case 200: 
            //当js发送code为200时,订阅获取识别结果事件
                    remoteObjectHandlers.put(200 ,data.readRemoteObject());
            //定时获取语音识别结果并返回JS UI                    
                getAsrText();
                    break;
                
                default: 
                    reply.writeString("service not defined");
                    return false;
                
            
            reply.writeString(ZSONObject.toZSONString(zsonResult));
            return true;
        

        @Override
        public IRemoteObject asObject() 
            return this;
        
    

    public void getAsrText() 
        new Thread(() -> 
            while (true) 
                try 
                    Thread.sleep(1 * 500);
                    Map<String, Object> zsonResult = new HashMap<String, Object>();
                    zsonResult.put("text",asrUtils.getResultAndClear());
                    ReportEvent(200, zsonResult);

                 catch (RemoteException | InterruptedException e) 
                    break;
                
            
        ).start();
    

    private void ReportEvent(int remoteHandler, Object backData) throws RemoteException 
        MessageParcel data = MessageParcel.obtain();
        MessageParcel reply = MessageParcel.obtain();
        MessageOption option = new MessageOption();
        data.writeString(ZSONObject.toZSONString(backData));
        IRemoteObject remoteObject = remoteObjectHandlers.get(remoteHandler);
        remoteObject.sendRequest(100, data, reply, option);
        reply.reclaim();
        data.reclaim();
    

至此简易的语音识别功能完毕。
相关演示:https://www.bilibili.com/video/BV1E44y177hv/
完整代码开源:https://gitee.com/panda-coder/harmonyos-apps/tree/master/AsrDemo

想了解更多关于鸿蒙的内容,请访问:

51CTO和华为官方合作共建的鸿蒙技术社区

https://harmonyos.51cto.com/#bkwz

::: hljs-center

:::

以上是关于鸿蒙AI能力之语音识别的主要内容,如果未能解决你的问题,请参考以下文章

人工智能代码实战:AI李白如何创作诗词

HarmonyOS之AI能力·语音播报

Unity语音识别(百度AI长语句语音识别&Unity原生短语语音识别)

海信携手Yi+AI 开启电视全场景实时视频识别新时代|京通社头条

HarmonyOS Sample 之 AI能力之NLU引擎服务

2019年,这8款自动语音识别方案你应该了解!