有没有办法直接使用 SpeechRecognizer API 进行语音输入？

Posted 2023-04-19

技术标签:

【中文标题】有没有办法直接使用 SpeechRecognizer API 进行语音输入？【英文标题】：Is there a way to use the SpeechRecognizer API directly for speech input? 【发布时间】：2011-06-25 21:59:27 【问题描述】：

android Dev 网站提供了一个使用内置 Google Speech Input Activity 进行语音输入的示例。该活动显示一个带有麦克风的预配置弹出窗口，并使用onActivityResult()传递其结果

我的问题：有没有办法直接使用SpeechRecognizer 类进行语音输入而不显示预设活动？这将让我为语音输入构建自己的活动。

【问题讨论】：

【参考方案1】：

这是使用 SpeechRecognizer 类的代码（来自here 和here）：

import android.app.Activity;
import android.content.Intent;
import android.os.Bundle;
import android.view.View;
import android.view.View.OnClickListener;
import android.speech.RecognitionListener;
import android.speech.RecognizerIntent;
import android.speech.SpeechRecognizer;
import android.widget.Button;
import android.widget.TextView;
import java.util.ArrayList;
import android.util.Log;



public class VoiceRecognitionTest extends Activity implements OnClickListener 


   private TextView mText;
   private SpeechRecognizer sr;
   private static final String TAG = "MyStt3Activity";
   @Override
   public void onCreate(Bundle savedInstanceState) 
   
            super.onCreate(savedInstanceState);
            setContentView(R.layout.main);
            Button speakButton = (Button) findViewById(R.id.btn_speak);     
            mText = (TextView) findViewById(R.id.textView1);     
            speakButton.setOnClickListener(this);
            sr = SpeechRecognizer.createSpeechRecognizer(this);       
            sr.setRecognitionListener(new listener());        
   

   class listener implements RecognitionListener          
   
            public void onReadyForSpeech(Bundle params)
            
                     Log.d(TAG, "onReadyForSpeech");
            
            public void onBeginningOfSpeech()
            
                     Log.d(TAG, "onBeginningOfSpeech");
            
            public void onRmsChanged(float rmsdB)
            
                     Log.d(TAG, "onRmsChanged");
            
            public void onBufferReceived(byte[] buffer)
            
                     Log.d(TAG, "onBufferReceived");
            
            public void onEndOfSpeech()
            
                     Log.d(TAG, "onEndofSpeech");
            
            public void onError(int error)
            
                     Log.d(TAG,  "error " +  error);
                     mText.setText("error " + error);
            
            public void onResults(Bundle results)                   
            
                     String str = new String();
                     Log.d(TAG, "onResults " + results);
                     ArrayList data = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
                     for (int i = 0; i < data.size(); i++)
                     
                               Log.d(TAG, "result " + data.get(i));
                               str += data.get(i);
                     
                     mText.setText("results: "+String.valueOf(data.size()));        
            
            public void onPartialResults(Bundle partialResults)
            
                     Log.d(TAG, "onPartialResults");
            
            public void onEvent(int eventType, Bundle params)
            
                     Log.d(TAG, "onEvent " + eventType);
            
   
   public void onClick(View v) 
            if (v.getId() == R.id.btn_speak) 
            
                Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);        
                intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
                intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE,"voice.recognition.test");

                intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS,5); 
                sr.startListening(intent);
                Log.i("111111","11111111");

使用按钮定义 main.xml 并在清单中授予 RECORD_AUDIO 权限

【讨论】：

在搜索其他内容时遇到了这个问题。虽然它的老问题我认为发布答案会对其他人有所帮助复制自***.com/questions/6316937/… :) 它总是输出 5 或 4 或错误 7 Result-5 应该被接受为答案，现在已经 3 年了。为了清楚起见，授予 RECORD_AUDIO 权限看起来像 <uses-permission android:name="android.permission.RECORD_AUDIO" />。【参考方案2】：

还要确保向用户请求适当的权限。我遇到了错误 9 返回值：INSUFFICIENT_PERMISSIONS，即使我在清单中列出了正确的 RECORD_AUDIO 权限。

按照示例代码here，我能够从用户那里获得权限，然后语音识别器返回了良好的响应。

例如在调用 SpeechRecognizer 方法之前，我将这个块放入我的 onCreate() 活动中，尽管它可以在 UI 流中的其他位置：

    protected void onCreate(Bundle savedInstanceState) 
        ...
        if (ContextCompat.checkSelfPermission(this,
            Manifest.permission.RECORD_AUDIO)
            != PackageManager.PERMISSION_GRANTED) 

        // Should we show an explanation?
        if (ActivityCompat.shouldShowRequestPermissionRationale(this,
                Manifest.permission.RECORD_AUDIO)) 

            // Show an explanation to the user *asynchronously* -- don't block
            // this thread waiting for the user's response! After the user
            // sees the explanation, try again to request the permission.

         else 

            // No explanation needed, we can request the permission.

            ActivityCompat.requestPermissions(this,
                    new String[]Manifest.permission.RECORD_AUDIO,
                    527);

            // MY_PERMISSIONS_REQUEST_READ_CONTACTS is an
            // app-defined int constant. The callback method gets the
            // result of the request. (In this example I just punched in
            // the value 527)
        
        ...

然后在activity中为权限请求提供回调方法：

@Override
public void onRequestPermissionsResult(int requestCode,
                                       String permissions[], int[] grantResults) 
    switch (requestCode) 
        case 527: 
            // If request is cancelled, the result arrays are empty.
            if (grantResults.length > 0
                    && grantResults[0] == PackageManager.PERMISSION_GRANTED) 

                // permission was granted, yay! Do the
                // contacts-related task you need to do.

             else 

                // permission denied, boo! Disable the
                // functionality that depends on this permission.
            
            return;
        

        // other 'case' lines to check for other
        // permissions this app might request

我必须在上面的 preetha 示例代码中更改另一件事，其中在 onResults() 方法中检索结果文本。要获取已翻译语音的实际文本（而不是原始代码打印的大小），要么打印构造字符串 str 的值，要么获取 ArrayList（数据）中的返回值之一。例如：

.setText(data.get(0));

【讨论】：

【参考方案3】：

您可以使用SpeechRecognizer，尽管我不知道除this previous SO question 之外的任何示例代码。但是，这是 API 级别 8 (Android 2.2) 的新功能，因此在撰写本文时尚未广泛使用。

【讨论】：

我编写了一个测试应用程序，试图启动 SpeechRecognizer.startListening() 以及实现的监听器方法，但什么也没发生。【参考方案4】：

你可以这样做：

import android.app.Activity
import androidx.appcompat.app.AppCompatActivity
import android.os.Bundle
import kotlinx.android.synthetic.main.activity_main.*
import android.widget.Toast
import android.content.ActivityNotFoundException
import android.speech.RecognizerIntent
import android.content.Intent

class MainActivity : AppCompatActivity() 
    private val REQ_CODE = 100

    override fun onCreate(savedInstanceState: Bundle?) 
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        speak.setOnClickListener 
            val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
            intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
                    RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
            intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE,  "ar-JO") //  Locale.getDefault()
            intent.putExtra(RecognizerIntent.EXTRA_PROMPT, "Need to speak")
            try 
                startActivityForResult(intent, REQ_CODE)
             catch (a: ActivityNotFoundException) 
                Toast.makeText(applicationContext,
                        "Sorry your device not supported",
                        Toast.LENGTH_SHORT).show()
            
        
    

    override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) 
        super.onActivityResult(requestCode, resultCode, data)

        when (requestCode) 
            REQ_CODE -> 
                if (resultCode == Activity.RESULT_OK && data != null) 
                    val result = data
                            .getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS)
                    println("result: $result")
                    text.text = result[0]

layout 可以很简单：

<?xml version = "1.0" encoding = "utf-8"?>
<RelativeLayout xmlns:android = "http://schemas.android.com/apk/res/android"
    xmlns:app = "http://schemas.android.com/apk/res-auto"
    xmlns:tools = "http://schemas.android.com/tools"
    android:layout_width = "match_parent"
    android:layout_height = "match_parent"
    tools:context = ".MainActivity">
    <LinearLayout
        android:layout_width = "match_parent"
        android:gravity = "center"
        android:layout_height = "match_parent">
        <TextView
            android:id = "@+id/text"
            android:textSize = "30sp"
            android:layout_width = "wrap_content"
            android:layout_height = "wrap_content"/>
    </LinearLayout>
    <LinearLayout
        android:layout_width = "wrap_content"
        android:layout_alignParentBottom = "true"
        android:layout_centerInParent = "true"
        android:orientation = "vertical"
        android:layout_height = "wrap_content">
        <ImageView
            android:id = "@+id/speak"
            android:layout_width = "wrap_content"
            android:layout_height = "wrap_content"
            android:background = "?selectableItemBackground"
            android:src = "@android:drawable/ic_btn_speak_now"/>
    </LinearLayout>
</RelativeLayout>

你问的另一种方式，时间长一点，但给你更多的控制权，也不会用谷歌帮助对话框打扰你：

1- 首先您需要在Manifest 文件中授予权限：

    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.RECORD_AUDIO"/>

2- 我将以上所有答案合并为：

创建RecognitionListener 类，为：

private val TAG = "Driver-Assistant"

class Listener(context: Context): RecognitionListener 
    private var ctx = context

    override fun onReadyForSpeech(params: Bundle?) 
        Log.d(TAG, "onReadyForSpeech")
    

    override fun onRmsChanged(rmsdB: Float) 
        Log.d(TAG, "onRmsChanged")
    

    override fun onBufferReceived(buffer: ByteArray?) 
        Log.d(TAG, "onBufferReceived")
    

    override fun onPartialResults(partialResults: Bundle?) 
        Log.d(TAG, "onPartialResults")
    

    override fun onEvent(eventType: Int, params: Bundle?) 
        Log.d(TAG, "onEvent")
    

    override fun onBeginningOfSpeech() 
        Toast.makeText(ctx, "Speech started", Toast.LENGTH_LONG).show()
    

    override fun onEndOfSpeech() 
        Toast.makeText(ctx, "Speech finished", Toast.LENGTH_LONG).show()
    

    override fun onError(error: Int) 
        var string = when (error) 
            6 -> "No speech input"
            4 -> "Server sends error status"
            8 -> "RecognitionService busy."
            7 -> "No recognition result matched."
            1 -> "Network operation timed out."
            2 -> "Other network related errors."
            9 -> "Insufficient permissions"
            5 -> " Other client side errors."
            3 -> "Audio recording error."
            else -> "unknown!!"
        
        Toast.makeText(ctx, "sorry error occurred: $string", Toast.LENGTH_LONG).show()
    

    override fun onResults(results: Bundle?) 
        Log.d(TAG, "onResults $results")
        val data = results!!.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)
        display.text = data!![0]

在主文件中你需要定义SpeechRecognizer，把上面的listner加进去，别忘了请求运行时权限，全部如下：

lateinit var sr: SpeechRecognizer
lateinit var display: TextView

class MainActivity : AppCompatActivity() 

    override fun onCreate(savedInstanceState: Bundle?) 
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        display = text

        if (ContextCompat.checkSelfPermission(this,
                        Manifest.permission.RECORD_AUDIO)
                != PackageManager.PERMISSION_GRANTED) 
            if (ActivityCompat.shouldShowRequestPermissionRationale(this,
                            Manifest.permission.RECORD_AUDIO)) 
             else 
                ActivityCompat.requestPermissions(this,
                        arrayOf(Manifest.permission.RECORD_AUDIO),
                        527)
            
        

        sr = SpeechRecognizer.createSpeechRecognizer(this)
        sr.setRecognitionListener(Listener(this))

        speak.setOnClickListener 
            val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
            intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
                    RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
            intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE,  "ar-JO") //  Locale.getDefault()
            sr.startListening(intent)
        

    

    override fun onRequestPermissionsResult(requestCode: Int, permissions: Array<out String>, grantResults: IntArray) 
        super.onRequestPermissionsResult(requestCode, permissions, grantResults)
        when (requestCode) 
            527  -> if (grantResults.isNotEmpty()
                    && grantResults[0] == PackageManager.PERMISSION_GRANTED) 

                Toast.makeText(this, "Permission granted", Toast.LENGTH_SHORT).show()
             else 
                Toast.makeText(this, "Permission not granted", Toast.LENGTH_SHORT).show()

【讨论】：

【参考方案5】：

package com.android.example.speechtxt;

import androidx.appcompat.app.AppCompatActivity;
import androidx.core.content.ContextCompat;

import android.Manifest;
import android.content.Intent;
import android.content.pm.PackageManager;
import android.net.Uri;
import android.os.Build;
import android.os.Bundle;
import android.provider.Settings;
import android.speech.RecognitionListener;
import android.speech.RecognizerIntent;
import android.speech.SpeechRecognizer;
import android.view.MotionEvent;
import android.view.View;
import android.widget.RelativeLayout;
import android.widget.Toast;

import java.util.ArrayList;
import java.util.Locale;

public class MainActivity extends AppCompatActivity 

    private RelativeLayout relativeLayout;
    private SpeechRecognizer speechRecognizer;
    private Intent speechintent;
    String keeper="";

    @Override
    protected void onCreate(Bundle savedInstanceState) 
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        checkVoiceCommandPermission();
        relativeLayout = findViewById(R.id.touchscr);

        speechRecognizer = SpeechRecognizer.createSpeechRecognizer(getApplicationContext());
        speechintent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
        speechintent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
        speechintent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());


        speechRecognizer.setRecognitionListener(new RecognitionListener() 
            @Override
            public void onReadyForSpeech(Bundle params) 

            

            @Override
            public void onBeginningOfSpeech() 

            

            @Override
            public void onRmsChanged(float rmsdB) 

            

            @Override
            public void onBufferReceived(byte[] buffer) 

            

            @Override
            public void onEndOfSpeech() 

            

            @Override
            public void onError(int error) 

            

            @Override
            public void onResults(Bundle results)
            
                ArrayList<String> speakedStringArray = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
                if(speakedStringArray!=null)
                
                    keeper = speakedStringArray.get(0);

                    Toast.makeText(getApplicationContext(),""+keeper,Toast.LENGTH_SHORT).show();
                
            

            @Override
            public void onPartialResults(Bundle partialResults) 

            

            @Override
            public void onEvent(int eventType, Bundle params) 

            
        );

        relativeLayout.setOnTouchListener(new View.OnTouchListener() 
            @Override
            public boolean onTouch(View v, MotionEvent event) 
                switch (event.getAction())
                
                    case MotionEvent.ACTION_DOWN:
                        speechRecognizer.startListening(speechintent);
                        keeper="";
                        break;
                    case MotionEvent.ACTION_UP:
                        speechRecognizer.stopListening();
                        break;
                
                return false;
            
        );
    


    private void checkVoiceCommandPermission()
    
        if(Build.VERSION.SDK_INT>=Build.VERSION_CODES.M)
        
            if (!(ContextCompat.checkSelfPermission(MainActivity.this, Manifest.permission.RECORD_AUDIO)== PackageManager.PERMISSION_GRANTED))
            
                Intent intent = new Intent(Settings.ACTION_APPLICATION_DETAILS_SETTINGS, Uri.parse("package:" +getPackageName()));
                startActivity(intent);
                finish();

【讨论】：

以上是关于有没有办法直接使用 SpeechRecognizer API 进行语音输入？的主要内容，如果未能解决你的问题，请参考以下文章